groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: uppercase german umlaut


From: holger.herrlich
Subject: Re: uppercase german umlaut
Date: Thu, 28 Dec 2023 08:48:30 +0000

echo ä | gpic | hexStream
0x2e 0x69 0x66 0x20 0x21 0x64 0x50 0x53  | .if !dPS
0x20 0x2e 0x64 0x73 0x20 0x50 0x53 0x0a  |  .ds PS.
0x2e 0x69 0x66 0x20 0x21 0x64 0x50 0x45  | .if !dPE
0x20 0x2e 0x64 0x73 0x20 0x50 0x45 0x0a  |  .ds PE.
0x2e 0x6c 0x66 0x20 0x31 0x20 0x2d 0x0a  | .lf 1 -.
0xc3 0xa4 0x0a                           | ...

echo Ä | gpic | hexStream
gpic:<standard input>:1: invalid input character code 132
0x2e 0x69 0x66 0x20 0x21 0x64 0x50 0x53  | .if !dPS
0x20 0x2e 0x64 0x73 0x20 0x50 0x53 0x0a  |  .ds PS.
0x2e 0x69 0x66 0x20 0x21 0x64 0x50 0x45  | .if !dPE
0x20 0x2e 0x64 0x73 0x20 0x50 0x45 0x0a  |  .ds PE.
0x2e 0x6c 0x66 0x20 0x31 0x20 0x2d 0x0a  | .lf 1 -.
0xc3 0x0a                                | ..

The character emerges from a input file name. So it is missed by
preconv somewhere, however why is 'ä' working properly/ just passed
through?



On Wed, 27 Dec 2023 01:29:53 -0600
Dave Kemper <saint.snit@gmail.com> wrote:

> On 12/26/23, holger.herrlich@posteo.de <holger.herrlich@posteo.de>
> wrote:
> > echo Ä | gpic
> > .if !dPS .ds PS
> > .if !dPE .ds PE
> > .lf 1 -
> > gpic:<standard input>:1: invalid input character code 132
> > �  
> 
> Hi Holger,
> 
> The paste above doesn't reveal what sequences of bytes your "echo" is
> outputting, but I deduce it's UTF-8, since "U+00C4 LATIN CAPITAL
> LETTER A WITH DIAERESIS" is encoded in UTF-8 as the two-byte hex
> sequence c3 84, the latter byte of which is 132 decimal, which is the
> number in your error message.  This is what I get in a UTF-8
> environment:
> 
> $ echo Ä | od -t u1
> 0000000 195 132  10
> 0000003
> 
> Unfortunately, the groff toolchain doesn't speak UTF-8, only Latin-1
> (and expanding this is a longstanding wish-list item:
> http://savannah.gnu.org/bugs/?40720).  So before pic sees the input,
> you'll have to convert it to a form pic understands.
> 
> The most flexible way to do this is with groff's preconv tool, because
> this will convert a wide range of Unicode input into escapes that the
> groff tools understand.
> 
> $ echo Ä | preconv -eutf-8
> .lf 1 -
> \[u00C4]
> 
> If all your input falls into the Latin-1 range, you can instead use
> the system iconv command to convert everything to Latin-1 (a.k.a. ISO
> 8859-1).
> 
> $ echo Ä | iconv -futf-8 -tiso-8859-1 | od -t u1
> 0000000 196  10
> 0000002
> 

Attachment: pgpPYjONUbzQp.pgp
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]