groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [groff] address@hidden: mom: PDF Author, pdfmom: needs C locale?]


From: Bernhard Fisseni
Subject: Re: [groff] address@hidden: mom: PDF Author, pdfmom: needs C locale?]
Date: Fri, 9 Mar 2018 11:24:16 +0100

Good morning,

Deri James schrieb am 09.03.2018 um 01:41:
> On Thu 08 Mar 2018 17:06:12 Peter Schaffter wrote:
>> This seems to be an "on again, off again" bug.  We discussed adding
>> LC_ALL=C to the command string in pdfmom, but I see it's not there.
>> Deri--any objections to adding it?  The alternative is to pass the
>> -a flag to the various greps, but Steffen Nurpmesco pointed out that
>> -a is not standardized.
> 
> I've got an example which is meant to show the problem (camus.mom), but 
> unfortunately I can't make it generate the error which others are seeing. 
> Camus.mom is a utf-8 file and I have used -k in a utf-8 user account 
> (LC_CTYPE=en_GB.UTF-8) and with -Kutf8 in an old style account 
> (LC_CTYPE=en_GB), neither produced an error from grep.
> 
> This leads me to suspect there is something in my version of grep which 
> "understands" that UTF-8 files are not binary data. I believe compiling grep 
> with NLS support is optional, so may be people who get this error are using a 
> grep without language support.

My grep version was 3.1, from Ubuntu 17.10.  I would assume that it
hat NLS because it gave German messages.  Recompiling a grep (version
3.1) explicity with or without NLS, I get the same problem (but the
message is always in English, of course). So NLS in the sense of the
compile option does not seem to be the problem.

Setting local variables (LC*, LANG, LANGUAGE) to en_GB.UTF-8 or
en_US.UTF-8 and fr_FR.UTF-8 (for LANGUAGE: en_GB:en, en_US:en, fr_FR:fr)
gives similar messages as German, just in English or French.  Wrapping
pdfmom with LC_ALL=C makes a difference; I can also get rid of the
message by undefining both LC_ALL and LANG.

Now these are all UTF-8 locales, so I do not think the problem is UTF-8
text.  Also, I can grep UTF-8 text as text with my version, and I do it
all the time.  `grep -i ä` finds both "Ä" and "ä" and does not complain.

To me it looks more as if the text given to grep is not in UTF-8, or the
environment is somehow altered. Relatedly, the `grep`s in pdfmom are
applied to gropdf stdin/stderr output (line 126 in my installation),
aren't they?  Could this gropdf output sometimes be invalid UTF-8 and
hence 'correctly' be considered binary by grep when used in an UTF-8
locale?  Then it could be NLS (or better: support for modern character
encodings) rather than a lack of it which is causing the problem.


Anyway, if LC_ALL=C fixes is, this should be sufficient.

Thank you all very much,
Best regards,

Bernhard


PS: Below you find a sample en_US locale config, these variables are all
defined in Ubuntu by default, so I changed them all.

export LC_IDENTIFICATION=en_US.UTF-8
export LC_TELEPHONE=en_US.UTF-8
export LC_TIME=en_US.UTF-8
export LC_NUMERIC=en_US.UTF-8
export LC_PAPER=en_US.UTF-8
export LC_MEASUREMENT=en_US.UTF-8
export LC_ADDRESS=en_US.UTF-8
export LC_MONETARY=en_US.UTF-8
export LC_NAME=en_US.UTF-8
export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
export LANGUAGE=en_US:en



reply via email to

[Prev in Thread] Current Thread [Next in Thread]