groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: uppercase german umlaut


From: Dave Kemper
Subject: Re: uppercase german umlaut
Date: Fri, 29 Dec 2023 19:02:49 -0600

On 12/28/23, holger.herrlich@posteo.de <holger.herrlich@posteo.de> wrote:
> echo ä | gpic | hexStream
> 0x2e 0x69 0x66 0x20 0x21 0x64 0x50 0x53  | .if !dPS
> 0x20 0x2e 0x64 0x73 0x20 0x50 0x53 0x0a  |  .ds PS.
> 0x2e 0x69 0x66 0x20 0x21 0x64 0x50 0x45  | .if !dPE
> 0x20 0x2e 0x64 0x73 0x20 0x50 0x45 0x0a  |  .ds PE.
> 0x2e 0x6c 0x66 0x20 0x31 0x20 0x2d 0x0a  | .lf 1 -.
> 0xc3 0xa4 0x0a                           | ...
>
> echo Ä | gpic | hexStream
> gpic:<standard input>:1: invalid input character code 132
> 0x2e 0x69 0x66 0x20 0x21 0x64 0x50 0x53  | .if !dPS
> 0x20 0x2e 0x64 0x73 0x20 0x50 0x53 0x0a  |  .ds PS.
> 0x2e 0x69 0x66 0x20 0x21 0x64 0x50 0x45  | .if !dPE
> 0x20 0x2e 0x64 0x73 0x20 0x50 0x45 0x0a  |  .ds PE.
> 0x2e 0x6c 0x66 0x20 0x31 0x20 0x2d 0x0a  | .lf 1 -.
> 0xc3 0x0a                                | ..
>
> The character emerges from a input file name. So it is missed by
> preconv somewhere,

As Lennart points out, the above pipelines don't invoke preconv at
all.  But also the above examples don't come from a filename, so I
suspect your example is too simplified from your actual use case to
illustrate the problem.  Do you have a command sequence that DOES
invoke preconv where UTF-8 characters are not being correctly handled?

> however why is 'ä' working properly/ just passed through?

It's not "working properly" in a sense that groff can handle.  The
input above shows the ä is coming out as 0xc3 0xa4, which is the UTF-8
encoding of the character.  But were this to go into a groff pipeline,
it would interpret those two bytes as two Latin-1 characters, neither
of which is ä.

(In the example you posted at the start of this thread, where the 0xc3
0xa4 went to the terminal, your terminal interpreted that sequence as
UTF-8 and displayed an ä.  So it only looked "right" because your
input and output encodings matched.)

Your second example shows that pic is discarding the byte of Ä's
encoding it doesn't recognize as valid Latin-1.  You can see this in
two ways: this byte is missing from your hexStream output, and pic
throws an error.  The only byte left, 0xc3, is a Latin-1 Ã, which how
groff would interpret it.  But your terminal, expecting UTF-8, would
be unable to output anything meaningful for this.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]