bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#52459: 28.0.90; prin1-to-string does not escape bidi control charact


From: Daniel Mendler
Subject: bug#52459: 28.0.90; prin1-to-string does not escape bidi control characters despite print-escape-control-characters=t
Date: Mon, 13 Dec 2021 17:32:40 +0100

On 12/13/21 4:24 PM, Eli Zaretskii wrote:
> So you want a feature that would produce strings suitable for using in
> program source files, like we did in the above example?  Is that the
> meaning of "sanitize" you have been using?
> 
> Are there other use cases for those "escaped" or "sanitized" strings?
> If so, please describe them as well.

As you said the use case is to produce strings suitable in source files
or strings in a form which looks like strings occurring in source files.
This use case appears in debuggers and other UIs which inspect variable
values at Emacs runtime. Furthermore code editing and refactoring tools
 produce strings which are supposed to be used directly in source files.

For the usage of strings in source files on could simply use
`print-escape-multibyte=t` however in particular in debugger UIs this
leads to a severe obfuscation of the output, which in particular hurts
users wo use Emacs in a setup with a multi-byte bidi language, Hebrew,
Arabic, Chinese, etc. Therefore in debugger UIs I only want to escape
control characters but not other multi-byte display characters.

There are packages which provide such debugging or live inspection of
values, I already mentioned the Helpful (MELPA) package and the
Marginalia (GNU ELPA) package. Both of these variable inspection
utilities are affected by the problem of how to escape string literals
or print Emacs values properly, transforming the value to a printable
representation as it would appear in source.

> For now, I understand that those strings are not necessarily required
> to be readable on the Emacs display, at least not in all cases,
> because some of the reordering that these controls produce will be
> disabled when they are represented by ASCII escapes, and the character
> order on display will change as result.  If the string includes RTL
> characters, the result might not be easily readable.  But AFAIU, this
> is not a problem for the use cases you have in mind?

Yes, if one escapes bidi control characters the readability of the
result is affected, but not as severely as with
`print-escape-multibyte=t`. The goal is to produce strings which don't
contain hidden characters for debugging, so even if readability is
affected it is still not as bad as with `print-escape-multibyte=t`.

Note that I am not a native speaker and only an occasional user of
multi-byte bidi languages, e.g., for educational purpose. Therefore I
cannot tell how such bidi strings are usually written in source code.
But I suspect that in source code literals ideally all non-visible
control characters are written in escaped form. However visible
multi-byte characters may be written literally in source code. At least
this is the practice I am using in source code regarding other Unicode
characters, I may write displayable characters as is but I will always
escape control characters (no difference between ASCII or Unicode).





reply via email to

[Prev in Thread] Current Thread [Next in Thread]