guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Implement open-process and related functions on MinGW


From: Eli Zaretskii
Subject: Re: [PATCH] Implement open-process and related functions on MinGW
Date: Mon, 24 Feb 2014 23:12:19 +0200

> From: Mark H Weaver <address@hidden>
> Cc: address@hidden,  address@hidden
> Date: Mon, 24 Feb 2014 13:33:56 -0500
> 
> >> For example, if it is true that we can avoid all the nasty problems with
> >> filename encoding by using the native Windows APIs that use UTF-16 for
> >> filenames, then I'd be in favor of doing that.
> >
> > What nasty problems do you have in mind?  I implemented something like
> > this in Emacs not long ago, so perhaps I can help here.
> 
> The nasty problem is that POSIX uses sequences of bytes for filenames,
> although conceptually filenames are character strings, and in fact
> virtually all user interfaces treat filenames as character strings.
> 
> Guile uses character strings for filenames (the only sane thing to do),
> and it would be good to build Guile on system APIs that also use
> character strings for filenames, instead of having to guess how to
> encode the characters into bytes.
> 
> We don't have a fully satisfactory solution to this problem on POSIX,
> but I guess we do on Windows, if we use the native Windows APIs.
> 
> BTW, the same problems exist for command-line arguments, environment
> variables, the hostname, etc.  All of these are sequences of bytes in
> POSIX, but conceptually they should be character strings.
> 
> If you'd like to work on a patch to have Guile use the native Windows
> APIs (that use UTF-16) for these things, I think that would be very
> useful and worthy of inclusion.

This issue needs to be carefully designed first.  File names are easy,
as long as Guile and the OS are concerned.  Environment variables and
command-line arguments likewise.  But once you need to display those
file names or variables, or ask the user to type them, there are
problems that don't have good solutions yet, at least not in Guile
applications that use the text terminal for display.

First, you need to bypass the usual stdio output routines and use
special APIs.  And after you've done that, you'll bump into the fact
that Windows console devices are limited in their ability to support
Unicode characters outside of the system locale; basically anything
beyond European scripts is not supported.  (Emacs avoids this problem
because its usual UI is a graphical one, where fonts and layout
engines are available that support almost any script in existence.)
Likewise for keyboard input: typing non-ASCII text into the Windows
console outside of the current console codepage is a tricky business;
basically, you need to completely bypass the "normal" stdio functions
and use Windows specific console APIs and Windows input methods.

There's also the issue of invoking other programs with arguments that
include Unicode characters.  Most programs that Guile will invoke on
Windows do not support that, they are "normal" console programs that
only support characters encoded in the current console codepage.
Windows will transparently convert from Unicode to the codepage
encoding, but if there are characters outside of that codepage, they
will be omitted or replaced by placebos, which might cause strange
failures.

There are also complications when calling functions from external
libraries that accept file names: those libraries will not normally
support Unicode characters in file names.  But this problem can be
solved by a known trick of using the 8+3 short aliases of the file
names, which use only ASCII characters.

So to provide something useful in this department, we need to discuss
what portions of Guile it is sensible and practical to convert to
Unicode, and how to treat those areas where we won't.  I will
certainly need some insider's help in this.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]