[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[groff] 46/62: [pdfpic]: Fix Savannah #64061.
From: |
G. Branden Robinson |
Subject: |
[groff] 46/62: [pdfpic]: Fix Savannah #64061. |
Date: |
Thu, 20 Apr 2023 06:14:39 -0400 (EDT) |
gbranden pushed a commit to branch branden-2023-04-20
in repository groff.
commit 4dc2c23f0c2b72517be62e60090f30b214299650
Author: G. Branden Robinson <g.branden.robinson@gmail.com>
AuthorDate: Mon Apr 17 16:41:33 2023 -0500
[pdfpic]: Fix Savannah #64061.
* tmac/pdfpic.tmac: Refactor to make comprehensible some woefully
undocumented cleverness and improve efficiency.
(PDFPIC): Break out flaming-hoop-leaping "clever" bit of `sy` usage
into its own macro, calling from here and relocating its requests from
here...
(pdfpic*system): ...to here. When using `sy` request to collect and
munge output of pdfinfo(1), (a) disable the escape character while
defining the macro; (b) construct the command in a roff string,
appending to it in discrete, hopefully comprehensible chunks; (c)
disable the escape character during macro interpretation wherever
possible (most of it); (d) retain doubled backslashes so that they
survive subsequent string interpolation; (e) stop using grep(1) in the
pipeline when sed(1) is perfectly capable of performing its own input
filtering; (f) invoke sed with '-n' option and emit output only upon a
successful substitution; and (g) replace unportable POSIX character
class '[:digit:]' in substitution matching text with '[0-9]'.
Annotate portability and escaping challenges. Tested on GNU/Linux,
macOS 12, and (with simulated pdfinfo(1) output), on Solaris 11.
(Solaris still requires GNU sed; keep reading.)
Even with all of that, there is _still_ a problem; the C++ function that
GNU troff uses to assemble the command string (character by character)
_does not recognize C/C++ string literal escape sequences_. This means
that you _cannot_ embed "\n" in `sy`'s arguments and have it survive, as
a newline character, into the command string passed to the standard C
library's system(3) function. ("A\nB" gets encoded as 'A', '\', 'n',
'B', not 'A', '\n', 'B'.) Unfortunately, this appears to be AT&T
troff-compatible behavior. But it means that you _cannot_ construct a
portable multi-line replacement text for sed's 's' command. (Other sed
commands like 'a', 'c', and 'i' will be similarly affected.) We
therefore (continue to) rely upon a non-standard feature of GNU and
macOS sed, such that the sequence "\n" in replacement text becomes a
newline in sed's pattern space.
Fixes <https://savannah.gnu.org/bugs/?64061>. Thanks to Bruno Haible
for the report, and to him and Ralph Corderoy for the discussion of
portable and efficient sed constructs.
---
ChangeLog | 43 +++++++++++++++++++++++++++++++++++++++++++
tmac/pdfpic.tmac | 48 +++++++++++++++++++++++++++++++++++-------------
2 files changed, 78 insertions(+), 13 deletions(-)
diff --git a/ChangeLog b/ChangeLog
index 5a1fa6a5b..35defb505 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,46 @@
+2023-04-19 G. Branden Robinson <g.branden.robinson@gmail.com>
+
+ * tmac/pdfpic.tmac: Refactor to make comprehensible some
+ woefully undocumented cleverness and improve efficiency.
+ (PDFPIC): Break out flaming-hoop-leaping "clever" bit of `sy`
+ usage into its own macro, calling from here and relocating its
+ requests from here...
+ (pdfpic*system): ...to here. When using `sy` request to collect
+ and munge output of pdfinfo(1), (a) disable the escape character
+ while defining the macro; (b) construct the command in a roff
+ string, appending to it in discrete, hopefully comprehensible
+ chunks; (c) disable the escape character during macro
+ interpretation wherever possible (most of it); (d) retain
+ doubled backslashes so that they survive subsequent string
+ interpolation; (e) stop using grep(1) in the pipeline when
+ sed(1) is perfectly capable of performing its own input
+ filtering; (f) invoke sed with '-n' option and emit output only
+ upon a successful substitution; and (g) replace unportable POSIX
+ character class '[:digit:]' in substitution matching text with
+ '[0-9]'. Annotate portability and escaping challenges. Tested
+ on GNU/Linux, macOS 12, and (with simulated pdfinfo(1) output),
+ on Solaris 11. (Solaris still requires GNU sed; keep reading.)
+
+ Even with all of that, there is _still_ a problem; the C++
+ function that GNU troff uses to assemble the command string
+ {character by character} _does not recognize C/C++ string
+ literal escape sequences_. This means that you _cannot_ embed
+ "\n" in `sy`'s arguments and have it survive, as a newline
+ character, into the command string passed to the standard C
+ library's system(3) function. ("A\nB" gets encoded as 'A', '\',
+ 'n', 'B', not 'A', '\n', 'B'.) Unfortunately, this appears to
+ be AT&T troff-compatible behavior. But it means that you
+ _cannot_ construct a portable multi-line replacement text for
+ sed's 's' command. (Other sed commands like 'a', 'c', and 'i'
+ will be similarly affected.) We therefore (continue to) rely
+ upon a non-standard feature of GNU and macOS sed, such that the
+ sequence "\n" in replacement text becomes a newline in sed's
+ pattern space.
+
+ Fixes <https://savannah.gnu.org/bugs/?64061>. Thanks to Bruno
+ Haible for the report, and to him and Ralph Corderoy for the
+ discussion of portable and efficient sed constructs.
+
2023-04-05 G. Branden Robinson <g.branden.robinson@gmail.com>
* tmac/tty.tmac: Add angle bracket fallbacks.
diff --git a/tmac/pdfpic.tmac b/tmac/pdfpic.tmac
index f81b66e7f..5367da75f 100644
--- a/tmac/pdfpic.tmac
+++ b/tmac/pdfpic.tmac
@@ -52,6 +52,40 @@
. rr pdfpic*desired-height
..
.
+.\" Get image dimensions. The `tr` command to strip null bytes is
+.\" distasteful, but its necessity is imposed on us. See
+.\" <https://gitlab.freedesktop.org/poppler/poppler/-/issues/776>.
+.\"
+.\" The following is a circus of portability and escaping constraints.
+.\" See <https://savannah.gnu.org/bugs/index.php?64061>. Modify it
+.\" only with great caution and after testing with a variety of seds.
+.\" (See the "HACKING" file in groff's source distribution.) Keep in
+.\" mind that the argument(s) to `sy` are assembled into a C string
+.\" and then passed to system(3), which invokes "sh -c" on the string.
+.\"
+.\" We therefore shut off roff's escape mechanism _twice_: once while
+.\" defining the macro, and once while interpreting part of it, to
+.\" preserve backslashes that we need in the constructed C string.
+.\"
+.\" We _still_ must escape the backslashes in the string appendments to
+.\" keep them from being interpreted as commencing roff escape sequences
+.\" when the string they populate is later interpolated.
+.eo
+.de pdfpic*system
+. ds pdfpic*command pdfinfo \$1
+. eo
+. as pdfpic*command " | tr -d '\\000'
+. as pdfpic*command " | sed -n -e '/Page *size:/
+. as pdfpic*command s/Page *size: *\\([0-9.]*\\) *x *\([0-9.]*\\).*$/
+. as pdfpic*command .nr pdfpic*width (p;\\1)\\n
+. as pdfpic*command .nr pdfpic*height (p;\\2)/p'
+. ec
+. as pdfpic*command " > \*[pdfpic*temporary-file]
+. sy \*[pdfpic*command]
+. rm pdfpic*command
+..
+.ec
+.
.de PDFPIC
. if !\\n[.U] \{\
. pdfpic@error use of \\$0 requires GNU troff's unsafe mode \
@@ -161,19 +195,7 @@ skipping '\\$1'
. return
. \}
. ds pdfpic*temporary-file \\*[pdfpic*temporary-directory]/pdfpic\n[$$]
-.
-. \" Get image dimensions. The `tr` command to strip null bytes is
-. \" distasteful, but its necessity is imposed on us. See
-. \" <https://gitlab.freedesktop.org/poppler/poppler/-/issues/776>.
-. ec @
-. sy pdfinfo @$1 | \
-tr -d '\000' | \
-grep "Page *size" | \
-sed -e 's/Page *size: *\\([[:digit:].]*\\) *x *\\([[:digit:].]*\\).*$/\
-.nr pdfpic*width (p;\\1)\\n\
-.nr pdfpic*height (p;\\2)/' \
-> @*[pdfpic*temporary-file]
-. ec
+. pdfpic*system \\$1
. if \\n[systat] \{\
. pdfpic@error retrieval of '\\$1' image dimensions failed with \
exit status \\n[systat]; skipping
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [groff] 46/62: [pdfpic]: Fix Savannah #64061.,
G. Branden Robinson <=