bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#52816: 28.0.90; misspelling of windows-nt system-type in reset-langu


From: Ioannis Kappas
Subject: bug#52816: 28.0.90; misspelling of windows-nt system-type in reset-language-environment, enquiring about default-process-coding-systems on MS-Windows
Date: Mon, 27 Dec 2021 10:52:20 +0000

Hi,

there appears to have been a symbol spelling mistake in a recent
commit a4bfb0bc5c14e002c0926fc320aeb4a3fc261447 to "Default Emacs to
UTF-8 instead of Latin-1". The `window-nt' symbol is used instead of
`windows-nt' when checking for membership in `system-type`.

It does not appear to be of much consequence though, since both
`default-file-name-coding-system' and `default-process-coding-system`
affected by it appear to be overwritten later on anyway at runtime. A
fix could be

diff --git a/lisp/international/mule-cmds.el b/lisp/international/mule-cmds.el
index a0a6557c95..2b52d4bf86 100644
--- a/lisp/international/mule-cmds.el
+++ b/lisp/international/mule-cmds.el
@@ -1873,7 +1873,7 @@ reset-language-environment
   (set-default-coding-systems nil)
   (setq default-sendmail-coding-system 'utf-8)
   (setq default-file-name-coding-system (if (memq system-type
-                                                  '(window-nt ms-dos))
+                                                  '(windows-nt ms-dos))
                                             'iso-latin-1-unix
                                           'utf-8-unix))
   ;; Preserve eol-type from existing default-process-coding-systems.
@@ -1892,9 +1892,9 @@ reset-language-environment
  (condition-case nil
       (coding-system-change-text-conversion
        (cdr default-process-coding-system)
-       (if (memq system-type '(window-nt ms-dos)) 'iso-latin-1 'utf-8))
+       (if (memq system-type '(windows-nt ms-dos)) 'iso-latin-1 'utf-8))
     (coding-system-error
-     (if (memq system-type '(window-nt ms-dos)) 'iso-latin-1 'utf-8)))))
+     (if (memq system-type '(windows-nt ms-dos)) 'iso-latin-1 'utf-8)))))
     (setq default-process-coding-system
    (cons output-coding input-coding)))

I happen to notice this while looking at the default data encoding
behaviour when sending data to a sub-process using
`call-process-region' on MS-Windows, which I found to differ when
compared out of the box to other OSes. When I was sending a UTF-8
region to a sub-process, I was expecting the data reaching the
sub-process to have that encoding, though that is not the default
behaviour on MS-Windows, which I found confusing, the data are most
likely to arrive encoded as iso-8859-1. On GNU/Linux though it is most
likely to arrive encoded as UTF-8.

I was expecting at first that the encoding of the data sent to the
sub-process would be determined by the region's codepage, but it
rather seems to be determined by the `default-process-coding-system'
(or by a particular sub-process' `process-coding-system-alist', when
set). This is all fine, and is document as such.

I think the expectation at this time and age is for communication
between processes should be in Unicode by default, so as to allow
multilingual sets to passed on between them.

`flycheck' is an example of a utility which is using `call-process' to
marshal buffers to/from a checker sub-processes. Sending multilingual
data to the checkers on MS-Windows are likely to cause failure due to
the default proc encodings being `undecided-unix', and thus encoded as
iso-8859-1 dropping the unicode chars. On GNU/Linux the same operation
is most likely to succeed, because the default encoding is most likely
to be set to `utf8-unix', courtesy of the LANG env variable being most
likely set to a UTF-8 codepage such as `C.UTF-8', and picked up by the
locale logic in Emacs.

The default process coding system is forced in
lisp/w32-fns.el:w32-set-default-process-coding-system:

  ;; Most programs on Windows will accept Unix line endings on input
  ;; (and some programs ported from Unix require it) but most will
  ;; produce DOS line endings on output.
  (setq default-process-coding-system
'(undecided-dos . undecided-unix))

Is now perhaps a good time perhaps now that the utf-8 adaptation is
almost universal, to change the default from undecided to utf-8 and
thus align it (more or less) with the the most likely out of the box
encoding behaviour on GNU/Linux?

Of course, a user can set the LANG env variable on MS-Windows to a
similar codepage as in Linux, but is rather unlikely a user would ever
set this on windows.

Also, should the eol type be set to -dos on the input encoding? The
comment suggests that this was done because most programs back then
were requiring unix eols, but I don't believe that this is the case
any more.

A final note, the documentation under `Default Coding Systems` gives a
warning that `undecided' coding systems do not work reliably with
asynchronous sub-process output, perhaps this is an additional
argument while we should move away from the undecided default above?

https://www.gnu.org/software/emacs/manual/html_node/elisp/Default-Coding-Systems.html

"""
Warning: Coding systems such as undecided, which determine the coding
system from the data, do not work entirely reliably with asynchronous
subprocess output. This is because Emacs handles asynchronous
subprocess output in batches, as it arrives. If the coding system
leaves the character code conversion unspecified, or leaves the
end-of-line conversion unspecified, Emacs must try to detect the
proper conversion from one batch at a time, and this does not always
work.
"""

Thanks!

In GNU Emacs 28.0.90 (build 1, x86_64-w64-mingw32)
 of 2021-12-26
Repository revision: 89a82182cbca0caa19f5b9463629918b7131ef0c
Repository branch: emacs-28
Windowing system distributor 'Microsoft Corp.', version 10
System Description: Microsoft Windows 10





reply via email to

[Prev in Thread] Current Thread [Next in Thread]