bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#71472: [PATCH] Add pty support by using ConPTY on Windows


From: Eli Zaretskii
Subject: bug#71472: [PATCH] Add pty support by using ConPTY on Windows
Date: Tue, 11 Jun 2024 10:27:04 +0300

[Please use Reply All to reply, to keep the bug tracker CC'ed.]

> Date: Tue, 11 Jun 2024 12:34:48 +0900
> From: Ke Wu <ellpih@zohomail.jp>
> 
>  > If we must use UTF-8 as the only encoding to talk to sub-processes via 
>  > ConPTY, that makes the number of applications that can be used this 
>  > way very small, since most programs we are used to run as 
>  > subprocesses, in particularly ports of GNU software like GCC, GDB, 
>  > Grep, Find, and many others, cannot reliably talk to Emacs in UTF-8 
>  > encoding on MS-Windows.
> 
> The statement is not so accurate. On Emacs side, UTF-8 is assumed due
> to the limitation of ConPTY (it would communicate with the console only in
> UTF-8). However, on the subprocesses side, ConPTY would respect its
> codepage and translate it into UTF-8 when sending to the console. So
> we can make these subprocesses run in the codepage other than
> 65001(UTF-8).

This is inaccurate: ConPTY always assumes the process running on the
other side of the connection uses the system codepage.  If the
subprocess expects some other encoding, ConPTY will not know that, and
Emacs has no way of telling ConPTY to use a different encoding.  This
is the essence of the issue I filed with them, and they basically told
me that what ConPTY does is "by design".

This is not an academic issue: some very important programs we invoke
from Emacs need us to talk to them in encoding different from the
system codepage.  A notable example is Git, which wants UTF-8 (it can
support other encodings, but that is not recommended, and Emacs
doesn't really support that well on Windows).

> I am not very familiar with these GNU software ports :(
> Please let me know if there will be problems with ConPTY translating from
> UTF-8 to other codepages.

See above.  There's no way for Emacs to set that up, except when the
"other codepage" is the system codepage.

>  >  https://github.com/microsoft/terminal/issues/9174 
> 
> I think a possible solution to this issue is to use a wrapper program to
> set the codepage for the applications that do not call `SetConsoleOutputCP`.
> As a proof of concept, the following code snippet uses cmdproxy.exe to
> change the codepage to 1255. Please replace the cmdproxy.exe path in the
> snippet.
> 
> (progn
>   (set-buffer
>    (apply #'make-term
>           "terminal"
>           "C:/Users/oracl/Documents/Programs/emacs-master/nt/cmdproxy.exe"
>           nil
>           '("-c" "chcp 1255 && call cmd")))
>   (term-char-mode)
>   (pop-to-buffer-same-window "*terminal*"))
> 
> The codepage can be verified by either using `chcp` in the newly created cmd 
> process.
> Also, the following hack can be applied to make the created conhost.exe 
> visible.
> Therefore, the codepage can be directly verified by viewing the properties of 
> the
> conhost.exe window. 
> 
> --- a/src/w32.c
> +++ b/src/w32.c
> @@ -11208,7 +11208,7 @@ make_console_with_pipe (ptrdiff_t nargs, Lisp_Object 
> * args, const int * fds)
> 
>    command_new = CALLN (Flist,
>                         build_string ("conhost.exe"),
> -                       build_string ("--headless"),
> +                       /* build_string ("--headless"), */
>                         build_string ("--feature"),
>                         build_string ("pty"));
>    if (!NILP (width)) 
> 
> Therefore, we can have subprocesses run in codepage other than 65001 or the 
> OEM default
> codepage.  And as a console program, Emacs talks in UTF-8.  It may be 
> feasible if we add a
> `:coding` to function `term`, which builds up a wrapper to change the code 
> page before the
> real program starts.

cmdproxy is only used when invoking programs via the shell.  But Emacs
also invokes programs directly (call-process etc.), in which case
cmdproxy (or any other kind of wrapper) will be very problematic at
best, if not impossible.  See below about the complications this
causes wrt quoting of command-line arguments, for example.

Please keep in mind how Emacs arranges to use correct encoding when
invoking other programs: we have data structures
(process-coding-system-alist etc.) which define the correct encoding
by program name, and we also have variables (coding-system-for-read
etc.) that can be bound to override those defaults temporarily.  The
encoding is applied separately to the program's command-line arguments
and to the stuff we write and read to and from the process.  How can
all this work reliably with ConPTY, even if the wrapper trick could
sometimes work?  Specifically:

  . how do we control encoding of command-line arguments? most
    programs running on Windows cannot handle UTF-8 encoded command
    lines
  . what if the encoding we need doesn't have a corresponding Windows
    codepage (which means chcp will not work)?
  . how can we handle the eol-conversion part of the encoding (some
    programs _must_ be fed with Unix EOLs)?

Also please note that using a wrapper adds another layer of
interpreting command-line arguments, which might break some
complicated cases that use fancy quoting of special characters.  Any
wrapper we provide will be compiled with MinGW, so it will use the
MinGW startup code to process quoting.  But the program the wrapper
runs might not be a MinGW program, so it could use different ways of
processing quotes.  The simplest example of such a combination is
cmd.exe itself: its quoting rules are very different from what MinGW
uses.  This will definitely break some cases.  For example, Git uses
the '^' character for special purposes, and some Windows styles of
quoting interpret '^' as a quote character -- this could easily break
Emacs commands that invoke Git.

If someone can figure out how to do all this stuff with ConPTY, then
okay, we could use it.  But it is not a trivial problem, not at all.
The way ConPTY was designed is the way Windows works everywhere else:
it doesn't allow applications to communicate with raw bytestreams
without interpreting; instead, Windows _interprets_ the bytestreams as
characters encoded in the encoding it assumes for the source, and then
converts those characters to the encoding of the destination.  This
basic design principle is built into every part of Windows APIs.  For
example, a program whose 'main' function is declared as accepting
wchar_t (i.e. UTF-16) command-line arguments will magically have the
command-line arguments converted to UTF-16, even if the calling
process uses plain ASCII.  ConPTY uses the same design principles, so
it is inherently unable to pass through raw bytes without interpreting
them.  And without that, we cannot easily implement the way Emacs
expects this stuff to work, because Emacs assumes the encoding to be a
private contract between Emacs and the program it calls, with nothing
in-between interfering.

I hope I explained some of the issues with ConPTY, and why we cannot
install its support without some reasonably reliable solutions for
those problematic aspects.

Thanks.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]