groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] mom : unicode in .INCLUDE'd files


From: John Gardner
Subject: Re: [Groff] mom : unicode in .INCLUDE'd files
Date: Sun, 23 Jul 2017 00:06:53 +1000

I was bitten by preconv(1) quite recently, actually. Gonna back Ingo here.
Can I semi-seriously implore the world to only use UTF-8, and pretend other
encodings don't exist? Squash everything into the same shuttle as EBCDIC
and blast it into the sun.

Heh, another recent (and unpleasant) experience was learning about the
harsh relationship between PostScript's "ISOLatin1Encoding" and
grave/accurate accents. I spent days trying to figure out why \` (U+0060) was
being converted to ‘ (U+2018). Several headaches later, I learned the PDF
viewer was actually doing this *to the PDF's source* when rendered. This
page tells the whole complex history.
<https://www.cl.cam.ac.uk/~mgk25/ucs/quotes.html#postscript>..

Ugh...

On 22 July 2017 at 23:56, Ingo Schwarze <address@hidden> wrote:

> Hi,
>
> Steffen Nurpmeso wrote on Fri, Jul 21, 2017 at 10:30:36PM +0200:
>
> > In my humble opinion preconv has to go as such,
> > i just do not know yet.  Just talking.
>
> So much talk...
>
> In mandoc, i completed that work in October 2014:
>
>   http://mandoc.bsd.lv/cgi-bin/cvsweb/preconv.c#rev1.9
>
>   "commit message:
>    integrate preconv(1) into mandoc(1); enhances functionality
>    and reduces code and docs by more than 300 lines"
>
> Admittedly, mandoc only handles UTF-8 and ISO-LATIN-1, and
> requires the user to convert files using obsolete encodings
> to UTF-8 using iconv(1) first.
>
> The only reason for supporting ISO-LATIN-1 is that many old manual
> pages in the wild still use it, most even without saying so.
> Otherwise, mandoc would be UTF-8 only on the input side.
>
> In the long run, i think that would be a reasonable direction
> for groff, too.  Preconv is notorious for causing trouble for
> casual users, see the several threads that were quoted earlier
> in this thread, and even very experienced users often get
> confused about how it works and where in the pipeline it is
> supposed to be, see this thread itself.
>
> "Groff input always has to be UTF-8", that would be a very simple,
> fool-proof principle.
>
> Yours,
>   Ingo
>
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]