[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: uppercase german umlaut
From: |
holger.herrlich |
Subject: |
Re: uppercase german umlaut |
Date: |
Thu, 28 Dec 2023 08:48:30 +0000 |
echo ä | gpic | hexStream
0x2e 0x69 0x66 0x20 0x21 0x64 0x50 0x53 | .if !dPS
0x20 0x2e 0x64 0x73 0x20 0x50 0x53 0x0a | .ds PS.
0x2e 0x69 0x66 0x20 0x21 0x64 0x50 0x45 | .if !dPE
0x20 0x2e 0x64 0x73 0x20 0x50 0x45 0x0a | .ds PE.
0x2e 0x6c 0x66 0x20 0x31 0x20 0x2d 0x0a | .lf 1 -.
0xc3 0xa4 0x0a | ...
echo Ä | gpic | hexStream
gpic:<standard input>:1: invalid input character code 132
0x2e 0x69 0x66 0x20 0x21 0x64 0x50 0x53 | .if !dPS
0x20 0x2e 0x64 0x73 0x20 0x50 0x53 0x0a | .ds PS.
0x2e 0x69 0x66 0x20 0x21 0x64 0x50 0x45 | .if !dPE
0x20 0x2e 0x64 0x73 0x20 0x50 0x45 0x0a | .ds PE.
0x2e 0x6c 0x66 0x20 0x31 0x20 0x2d 0x0a | .lf 1 -.
0xc3 0x0a | ..
The character emerges from a input file name. So it is missed by
preconv somewhere, however why is 'ä' working properly/ just passed
through?
On Wed, 27 Dec 2023 01:29:53 -0600
Dave Kemper <saint.snit@gmail.com> wrote:
> On 12/26/23, holger.herrlich@posteo.de <holger.herrlich@posteo.de>
> wrote:
> > echo Ä | gpic
> > .if !dPS .ds PS
> > .if !dPE .ds PE
> > .lf 1 -
> > gpic:<standard input>:1: invalid input character code 132
> > �
>
> Hi Holger,
>
> The paste above doesn't reveal what sequences of bytes your "echo" is
> outputting, but I deduce it's UTF-8, since "U+00C4 LATIN CAPITAL
> LETTER A WITH DIAERESIS" is encoded in UTF-8 as the two-byte hex
> sequence c3 84, the latter byte of which is 132 decimal, which is the
> number in your error message. This is what I get in a UTF-8
> environment:
>
> $ echo Ä | od -t u1
> 0000000 195 132 10
> 0000003
>
> Unfortunately, the groff toolchain doesn't speak UTF-8, only Latin-1
> (and expanding this is a longstanding wish-list item:
> http://savannah.gnu.org/bugs/?40720). So before pic sees the input,
> you'll have to convert it to a form pic understands.
>
> The most flexible way to do this is with groff's preconv tool, because
> this will convert a wide range of Unicode input into escapes that the
> groff tools understand.
>
> $ echo Ä | preconv -eutf-8
> .lf 1 -
> \[u00C4]
>
> If all your input falls into the Latin-1 range, you can instead use
> the system iconv command to convert everything to Latin-1 (a.k.a. ISO
> 8859-1).
>
> $ echo Ä | iconv -futf-8 -tiso-8859-1 | od -t u1
> 0000000 196 10
> 0000002
>
pgpPYjONUbzQp.pgp
Description: OpenPGP digital signature