groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] pdfmom grep (was parallel text processing)


From: Peter Schaffter
Subject: Re: [Groff] pdfmom grep (was parallel text processing)
Date: Sat, 9 Sep 2017 09:51:27 -0400
User-agent: Mutt/1.5.24 (2015-08-30)

On Sat, Sep 09, 2017, Ralph Corderoy wrote:
> Hi Peter,
> 
> > The grep in pdfmom is returning a binary file hit when it encounters
> > the diacritic in 
> >
> >   .ds pdf:look(pdf:bm1) L'étranger
> 
> What does locale(1) output for you where you run this pdfmom command?

  LANG=en_CA.UTF-8
  LANGUAGE=en_CA:en
  LC_CTYPE="en_CA.UTF-8"
  LC_NUMERIC="en_CA.UTF-8"
  LC_TIME="en_CA.UTF-8"
  LC_COLLATE="en_CA.UTF-8"
  LC_MONETARY="en_CA.UTF-8"
  LC_MESSAGES="en_CA.UTF-8"
  LC_PAPER="en_CA.UTF-8"
  LC_NAME="en_CA.UTF-8"
  LC_ADDRESS="en_CA.UTF-8"
  LC_TELEPHONE="en_CA.UTF-8"
  LC_MEASUREMENT="en_CA.UTF-8"
  LC_IDENTIFICATION="en_CA.UTF-8"
  LC_ALL=en_CA.UTF-8
 
> > The solution is to pass the -a flag to grep.
> 
> How about 
> 
>     groff ... 2>&1 | LC_ALL=C grep '^\.ds' | groff ...

Yes, that's the solution I thought of before suggesting the tidier
but, as Steffen pointed out, not universal -a flag.
 
> BTW, pdfmom has a bug shown by that strace command I suggested.
> 
>     system("groff ... 2>&1 | grep '^\.ds' | groff ...");
> 
> That's a double-quoted Perl string so `\.' is escaping the dot and grep
> sees a plain dot for `any character'.  The backslash needs doubling.

Missed that.  Argh.  Why don't they make special glasses that let
you see code as if for the first time whenever you put them on?

-- 
Peter Schaffter
http://www.schaffter.ca



reply via email to

[Prev in Thread] Current Thread [Next in Thread]