emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: emacs-26 8f18d12: Improve documentation of decoding into a unibyte b


From: Stefan Monnier
Subject: Re: emacs-26 8f18d12: Improve documentation of decoding into a unibyte buffer
Date: Mon, 27 May 2019 09:32:11 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux)

> Almost all uses of string-as-unibyte are gone now, but the one I was
> looking at is this one in international/mule-cmds.el:
>
>     (defun encoded-string-description (str coding-system)
>       "Return a pretty description of STR that is encoded by CODING-SYSTEM."
>       (setq str (string-as-unibyte str))
>       (mapconcat
>        (if (and coding-system (eq (coding-system-type coding-system) 
> 'iso-2022))
>            ;; Try to get a pretty description for ISO 2022 escape sequences.
>            (function (lambda (x) (or (cdr (assq x iso-2022-control-alist))
>                                      (format "#x%02X" x))))
>          (function (lambda (x) (format "#x%02X" x))))
>        str " "))
>
> If I take a string of say "β", and replace string-as-unibyte with
> (encode-coding-string 'emacs-internal), `encoded-string-description'
> prints "#xCE #xB2", which is the correct UTF-8 encoded
> value. 'raw-text works too. Iʼm certain that there are subtle
> differences between the two that I donʼt understand.

But "β" is not a "STR that is encoded by CODING-SYSTEM", so this output
is neither correct nor incorrect in any case.

I think the right thing to do here is one of:
- signal an error if `str` is multibyte.
- signal an error if `str` is multibyte and contains non-byte chars.
- if multibyte, encode `str` with `coding-system`.
- just don't bother looking at whether `str` is unibyte or not, just
  pass it as is to `mapconcat`.
- just don't bother looking at whether `str` is unibyte or not, just
  pass it as is to `mapconcat` but in the lambda, do catch the case
  where `x` is an "eight bit raw-byte char" and if so pass it to
  multibyte-char-to-unibyte.
- ...

But encoding `str` with any coding system like raw-text or
emacs-internal doesn't seem to make much sense.


        Stefan





reply via email to

[Prev in Thread] Current Thread [Next in Thread]