[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Groff] pdfmom grep (was parallel text processing)
From: |
Steffen Nurpmeso |
Subject: |
Re: [Groff] pdfmom grep (was parallel text processing) |
Date: |
Fri, 08 Sep 2017 19:46:36 +0200 |
User-agent: |
s-nail v14.9.3-64-gad47883e |
Peter Schaffter <address@hidden> wrote:
|On Fri, Sep 08, 2017, Ralph Corderoy wrote:
|>> You'll notice that the top of the pdf file has a line of text spit out
|>> by grep(1) that obviously shouldn't be there.
|>
|> I couldn't come up with the groff 1.22.3-7 command line required to
|> build the PDF correctly, nor get grep's unwanted output. Deri suggested
|> pdfmom's grep might be the culprit, but its stderr should end up on
|> pdfmom's stderr?
|
|Problem solved.
|
|The superfluous line at the top of the file ["Binary file (standard
|input) matches"] isn't stderr, it's stdout, so it becomes part of
|the pipeline. The grep in pdfmom is returning a binary file hit when
|it encounters the diacritic in
|
| .ds pdf:look(pdf:bm1) L'étranger
|
|Since the binary file hit doesn't begin with .ds, it prints literally
|at the top of the file.
|
|The solution is to pass the -a flag to grep.
|
|Deri: do you want me to fix this in pdfmom and push the change, or
|would you prefer to do it yourself?
|
|Question: why does grep treat the presence of the diacritic as cause
|for saying "Binary file (standard input) matches"?
Likely because that is true in your locale? It is very likely
that this cannot work (i see -k could possibly happen), suppose
you are in a LATIN1 locale and process UTF-8, and it is even worse
when your own locale is more picky than LATIN1. Strives me this
should be split up so that perl itself performs the grep, in
charset-agnostic mode. Even very large documents should generate
no limit here, otherwise there is no problem to create the two
pipelines concurrently ...
--steffen
|
|Der Kragenbaer, The moon bear,
|der holt sich munter he cheerfully and one by one
|einen nach dem anderen runter wa.ks himself off
|(By Robert Gernhardt)
- [Groff] parallel text processing ; vertical and horizontal mode, E. Hoffmann, 2017/09/06
- Re: [Groff] parallel text processing ; vertical and horizontal mode, Ralph Corderoy, 2017/09/06
- Re: [Groff] parallel text processing ; vertical and horizontal mode, E. Hoffmann, 2017/09/06
- Re: [Groff] parallel text processing ; vertical and horizontal mode, Deri James, 2017/09/07
- Re: [Groff] parallel text processing ; vertical and horizontal mode, Ralph Corderoy, 2017/09/08
- [Groff] pdfmom grep (was parallel text processing), Peter Schaffter, 2017/09/08
- Re: [Groff] pdfmom grep (was parallel text processing),
Steffen Nurpmeso <=
- Re: [Groff] pdfmom grep (was parallel text processing), Steffen Nurpmeso, 2017/09/08
- Re: [Groff] pdfmom grep (was parallel text processing), Steffen Nurpmeso, 2017/09/08
- Re: [Groff] pdfmom grep (was parallel text processing), Peter Schaffter, 2017/09/08
- Re: [Groff] pdfmom grep (was parallel text processing), Ralph Corderoy, 2017/09/09
- Re: [Groff] pdfmom grep (was parallel text processing), Peter Schaffter, 2017/09/09
- Re: [Groff] pdfmom grep (was parallel text processing), Ralph Corderoy, 2017/09/09
- Re: [Groff] pdfmom grep (was parallel text processing), Peter Schaffter, 2017/09/09
- Re: [Groff] pdfmom grep (was parallel text processing), Ralph Corderoy, 2017/09/10
- Re: [Groff] pdfmom grep (was parallel text processing), Peter Schaffter, 2017/09/10
- Re: [Groff] pdfmom grep (was parallel text processing), Deri James, 2017/09/09