groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: uppercase german umlaut


From: Dave Kemper
Subject: Re: uppercase german umlaut
Date: Tue, 6 Feb 2024 00:23:23 -0600

On 2/5/24, hohe72@posteo.de <hohe72@posteo.de> wrote:
> On Tue, 9 Jan 2024 01:13:45 -0600
> Dave Kemper <saint.snit@gmail.com> wrote:
>
>> In the message to which I was replying, you were speaking of the
>> sequence of bytes that were part of the input to gpic; in this realm,
>> ECMA-48 is irrelevant.  And in any case, the 0x84 byte in question is
>> part of the UTF-8 encoding of Unicode character U+00C4 LATIN CAPITAL
>> LETTER A WITH DIAERESIS; if it's being interpreted by a terminal
>> somewhere as ECMA-48, something is going wrong.
>>
>> What seems to be going wrong in this instance is that you're passing
>> UTF-8 directly to gpic without first running it through preconv or
>> iconv, resulting in a byte sequence gpic doesn't recognize.  You
>> haven't said whether you've tried converting the input before sending
>> it to gpic, or why you're avoiding preconv.
>
> I quote myself:
> "The character emerges from a input file name. So it is missed by
> preconv somewhere, ..."

Since you haven't said what your pipeline is, I can't debug what
preconv is missing or why.  But in general if you're doing something
like:

someprog | gpic

where "someprog" is outputting UTF-8, then you should change the pipeline to:

someprog | preconv -eutf8 | gpic

Like all groff tools, gpic will not recognize UTF-8 input.  The
encoding has to be converted before gpic sees it.

> You completely miss the point of the utf8 sequence "ä" passes while
> "Ä" issues.

I didn't miss this.  Lennart explained this in his December 28 reply
in this thread, and I reiterated it in my December 29 reply, and again
in my January 2 reply.  In short: UTF-8 "ä" in a Latin-1 context is
interpreted as two Latin-1 characters whereas UTF-8 "Ä" in a Latin-1
context is one Latin-1 character and one invalid (to groff tools)
control character.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]