groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [groff] devutf8 on Windows


From: Eli Zaretskii
Subject: Re: [groff] devutf8 on Windows
Date: Mon, 25 Feb 2019 14:03:06 +0200
User-agent: K-9 Mail for Android

On February 25, 2019 1:44:07 PM GMT+02:00, Jeff Conrad <address@hidden> wrote:
> Monday, February 25, 2019 2:35 AM, Eli Zaretskii wrote:
> 
> > > Running something like
> > >
> > >     groff -Tutf8 <file>
> > >
> > > rather than something like
> > >
> > >     groff -Tutf8 <file> | more
> > >
> > > or
> > >
> > >     groff -Tutf8 <file> > <outfile>
> > >
> > > Jeff
> > 
> > Yes, I tried all of the above.  The last method ends up with correct
> UTF-8
> > sequences, all the others yield mojibake.
> 
> Since method 2 works for me, I guess I’m having better luck than you—I
> suppose I should count my blessings :-).  Especially since method 2 is
> the one I would most often use.
> 
> > Groff, of course, writes the same bytes in all mrthods.
> 
> As is does for me, confirmed by ‘od -h’.
> 
> The question, then, is why grotty is behaving differently than my
> simple
> C program, which—as nearly as I can tell—is doing the same thing when
> outputting characters.  Win 10 vs.  Win 7?  Compiler?  Or perhaps I
> missed something important elsewhere in the code for tty.cpp.
> 
> Anyway, stuff like this should make it clear why someone running
> Windows
> would do something as silly as create a devcp1252.
> 
> Jeff

You are on Windows 10, which probably explains everything.

The only explanation I could come up with regarding your simple program is that 
VS linked it against static libraries, or maybe special versions of dynamic 
libraries, which implement fputs etc. in a way that works better with Windows 
10 console.  By contrast, Groff you find on ezwinports links dynamically to 
MSVCRT.DLL.  I stepped with a debugger through tty_printer:put_char and 
verified that it gets the same Unicode codepoints and produces the same UTF-8 
sequences as your test program.  So the explanation must be outside Groff.

In any case, the conclusion remains that UTF-8 console output on Windows is 
unreliable, perhaps apart of Windows 10.  Which isn't surprising, given that 
variable-length multibyte encodings are second-class citizens on Windows, as 
documented by MSDN.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]