groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: uppercase german umlaut


From: Dave Kemper
Subject: Re: uppercase german umlaut
Date: Tue, 9 Jan 2024 01:13:45 -0600

On 1/8/24, hohe72@posteo.de <hohe72@posteo.de> wrote:
> On Tue, 2 Jan 2024 11:04:25 -0600
> Dave Kemper <saint.snit@gmail.com> wrote:
>
>> > ECMA-48 says for 0x84:
>>
>> Also irrelevant to groff, as it doesn't use ECMA-48.  Groff tools
>> (including gpic) take input in Latin-1, period.
>
> I don't think so. ECMA-48 may be interpreted by terminals.

In the message to which I was replying, you were speaking of the
sequence of bytes that were part of the input to gpic; in this realm,
ECMA-48 is irrelevant.  And in any case, the 0x84 byte in question is
part of the UTF-8 encoding of Unicode character U+00C4 LATIN CAPITAL
LETTER A WITH DIAERESIS; if it's being interpreted by a terminal
somewhere as ECMA-48, something is going wrong.

What seems to be going wrong in this instance is that you're passing
UTF-8 directly to gpic without first running it through preconv or
iconv, resulting in a byte sequence gpic doesn't recognize.  You
haven't said whether you've tried converting the input before sending
it to gpic, or why you're avoiding preconv.

> In the case of terminal output, those characters if interpreted as
> control sequences would thrown the output into disarray. Therefore,
> if I'm right, it's rejected as invalid but not passed through.

Correct, gpic won't pass through bytes it considers invalid.

$ echo Ä | od -t x1
0000000 c3 84 0a
0000003
$ echo Ä | pic | grep -av '^\.' | od -t x1
pic:<standard input>:1: invalid input character code 132
0000000 c3 0a
0000002

gpic strips the 0x84 (decimal 132) byte, leaving you with invalid
UTF-8, or valid but erroneous Latin-1.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]