[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[groff] 08/10: [troff]: Fix Savannah #61401.
From: |
G. Branden Robinson |
Subject: |
[groff] 08/10: [troff]: Fix Savannah #61401. |
Date: |
Sat, 30 Oct 2021 19:26:35 -0400 (EDT) |
gbranden pushed a commit to branch master
in repository groff.
commit eb695ab2b5e2bae54afa102355c493bda6e29d3e
Author: G. Branden Robinson <g.branden.robinson@gmail.com>
AuthorDate: Sat Oct 30 15:45:29 2021 +1100
[troff]: Fix Savannah #61401.
[troff]: Handle special character escape sequences that map to basic
Latin glyphs in device control escape sequences consistently among
output devices.
* src/roff/troff/input.cpp (encode_char): Rearrange conditionals. This
is the logic that puts the "whatever" within a \X'whatever' escape
sequence into GNU troff's intermediate output. Handle stretchable and
unstretchable space escape sequences ("\ " and \~") first. Then, if
the token is a special character escape sequence, retrieve its
"contents" (glyph name). Move the basic Latin mapping for the seven
glyph names '-', 'aq', 'dq', 'ga', 'ha', 'rs', and 'ti' here, before
checking whether the device description issued the
'use_charnames_in_special' directive. This way, the 'html' and
'xhtml' output devices can straightforwardly embed these basic Latin
characters in device control escapes (notably, "html:", for which the
present convention is to follow the this tag immediately with a
literal HTML URI, complete with `<a href>` element syntax). If the
special character is none of these and we should
'use_charnames_in_special', proceed as groff 1.22.4 and earlier did.
This is a behavior change, as was my addition of this translation
mechanism in the first place, so...
* doc/groff.texi (Postprocessor Access): Document it.
* src/roff/groff/tests/device_control_escapes_express_basic_latin.sh:
Test it.
* src/roff/groff/groff.am (groff_TESTS): Run test.
Fixes <https://savannah.gnu.org/bugs/?61401>.
---
ChangeLog | 33 ++++++++++++
doc/groff.texi | 50 ++++++++++--------
src/roff/groff/groff.am | 1 +
.../device_control_escapes_express_basic_latin.sh | 60 ++++++++++++++++++++++
src/roff/troff/input.cpp | 44 +++++++++-------
5 files changed, 148 insertions(+), 40 deletions(-)
diff --git a/ChangeLog b/ChangeLog
index cef7428..100a3f8 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,38 @@
2021-10-30 G. Branden Robinson <g.branden.robinson@gmail.com>
+ [troff]: Handle special character escape sequences that map to
+ basic Latin glyphs in device control escape sequences
+ consistently among output devices.
+
+ * src/roff/troff/input.cpp (encode_char): Rearrange
+ conditionals. This is the logic that puts the "whatever" within
+ a \X'whatever' escape sequence into GNU troff's intermediate
+ output. Handle stretchable and unstretchable space escape
+ sequences ("\ " and \~") first. Then, if the token is a special
+ character escape sequence, retrieve its "contents" (glyph name).
+ Move the basic Latin mapping for the seven glyph names '-',
+ 'aq', 'dq', 'ga', 'ha', 'rs', and 'ti' here, before checking
+ whether the device description issued the
+ 'use_charnames_in_special' directive. This way, the 'html' and
+ 'xhtml' output devices can straightforwardly embed these basic
+ Latin characters in device control escapes (notably, "html:",
+ for which the present convention is to follow the this tag
+ immediately with a literal HTML URI, complete with `<a href>`
+ element syntax). If the special character is none of these and
+ we should 'use_charnames_in_special', proceed as groff 1.22.4
+ and earlier did. This is a behavior change, as was my addition
+ of this translation mechanism in the first place, so...
+
+ * doc/groff.texi (Postprocessor Access): Document it.
+
+ * src/roff/groff/tests/\
+ device_control_escapes_express_basic_latin.sh: Test it.
+ * src/roff/groff/groff.am (groff_TESTS): Run test.
+
+ Fixes <https://savannah.gnu.org/bugs/?61401>.
+
+2021-10-30 G. Branden Robinson <g.branden.robinson@gmail.com>
+
[troff]: Map \[ti] correctly in device control escape sequences.
* src/roff/troff/input.cpp (encode_char): Fix copy-and-paste
diff --git a/doc/groff.texi b/doc/groff.texi
index 7a8dfdf..4640f6a 100644
--- a/doc/groff.texi
+++ b/doc/groff.texi
@@ -14882,11 +14882,18 @@ There are two escapes that give information directly
to the
postprocessor. This is particularly useful for embedding PostScript
into the final document.
-@DefreqList {device, xxx}
-@DefescListEndx {\\X, @code{'}, xxx, @code{'}}
-Embeds its argument into the @code{gtroff} output preceded with
-@w{@samp{x X}}.
+@DefreqList {device, xxx @r{@dots{}}}
+@DefescListEndx {\\X, @code{'}, xxx @r{@dots{}}, @code{'}}
+Embed all @var{xxx} arguments into GNU @code{troff} output as parameters
+to a device control command @w{@samp{x X}}. The meaning and
+interpretation of such parameters is determined by the output driver or
+other postprocessor.
+@cindex @code{device} request, and copy mode
+@cindex copy mode, and @code{device} request
+@cindex mode, copy, and @code{device} request
+The @code{device} request processes its arguments in copy mode
+(@pxref{Copy Mode}).
@cindex @code{\&}, in @code{\X}
@cindex @code{\)}, in @code{\X}
@cindex @code{\%}, in @code{\X}
@@ -14896,27 +14903,28 @@ Embeds its argument into the @code{gtroff} output
preceded with
@ifinfo
@cindex @code{\@r{<colon>}}, in @code{\X}
@end ifinfo
-The escapes @code{\&}, @code{\)}, @code{\%}, and @code{\:} are ignored
-within @code{\X}, @w{@samp{\ }} and @code{\~} are converted to single
-space characters. All other escapes (except @code{\\}, which produces a
-backslash) cause an error.
-
-@cindex @code{device} request, and copy mode
-@cindex copy mode, and @code{device} request
-@cindex mode, copy, and @code{device} request
-Contrary to @code{\X}, the @code{device} request simply processes its
-argument in copy mode (@pxref{Copy Mode}).
+By contrast, within @code{\X} arguments, the escape sequences @code{\&},
+@code{\)}, @code{\%}, and @code{\:} are ignored, @code{\SP} and
+@code{\~} are converted to single space characters, and @code{\\} has
+its escape character stripped. So that the basic Latin subset of the
+Unicode character set@footnote{that is, ISO@tie{}646:1991-IRV or,
+popularly, ``US-ASCII''} can be reliably encoded in device control
+commands, seven special character escape sequences (@samp{\-},
+@samp{\aq}, @samp{\dq}, @samp{\ga}, @samp{\ha}, @samp{\rs}, and
+@samp{\ti},) are mapped to basic Latin glyphs; see the
+@cite{groff_char@r{(7)}} man page. The use of any other escape sequence
+in @code{\X} arguments is normally an error.
@kindex use_charnames_in_special
@pindex DESC@r{, and @code{use_charnames_in_special}}
@cindex @code{\X}, and special characters
-If the @samp{use_charnames_in_special} keyword is set in the @file{DESC}
-file, special characters no longer cause an error; they are simply
-output verbatim. Additionally, the backslash is represented as
-@code{\\}.
-
-@samp{use_charnames_in_special} is currently used by @code{grohtml}
-only.
+If the @code{use_charnames_in_special} directive appears in the output
+device's @file{DESC} file, the use of special character escape sequences
+is @emph{not} an error; they are simply output verbatim (with the
+exception of the seven mapped to Unicode basic Latin characters,
+discussed above). For convenience, the backslash can be represented as
+@samp{\\}. @code{use_charnames_in_special} is currently used only by
+@code{grohtml}.
@endDefesc
@DefreqList {devicem, xx}
diff --git a/src/roff/groff/groff.am b/src/roff/groff/groff.am
index 3140acd..fa5ef52 100644
--- a/src/roff/groff/groff.am
+++ b/src/roff/groff/groff.am
@@ -39,6 +39,7 @@ groff_TESTS = \
src/roff/groff/tests/ab_works.sh \
src/roff/groff/tests/adjustment_works.sh \
src/roff/groff/tests/break_zero-length_output_line_sanely.sh \
+ src/roff/groff/tests/device_control_escapes_express_basic_latin.sh \
src/roff/groff/tests/do_not_loop_infinitely_when_breaking_cjk.sh \
src/roff/groff/tests/dot-cp_register_works.sh \
src/roff/groff/tests/dot-nm_register_works.sh \
diff --git a/src/roff/groff/tests/device_control_escapes_express_basic_latin.sh
b/src/roff/groff/tests/device_control_escapes_express_basic_latin.sh
new file mode 100755
index 0000000..99d1cee
--- /dev/null
+++ b/src/roff/groff/tests/device_control_escapes_express_basic_latin.sh
@@ -0,0 +1,60 @@
+#!/bin/sh
+#
+# Copyright (C) 2021 Free Software Foundation, Inc.
+#
+# This file is part of groff.
+#
+# groff is free software; you can redistribute it and/or modify it under
+# the terms of the GNU General Public License as published by the Free
+# Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# groff is distributed in the hope that it will be useful, but WITHOUT
+# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+# for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program. If not, see <http://www.gnu.org/licenses/>.
+#
+
+groff="${abs_top_builddir:-.}/test-groff"
+fail=
+
+# Confirm translation of a groff special character escape sequence to a
+# basic Latin character when used in a device control escape sequence.
+#
+# $1 is the special character escape _without_ the leading backslash.
+# $2 is the expected output character _shell-quoted as necessary_.
+# $3 is a human-readable glyph description for the test log.
+# $4 is the groff -T device name under test.
+check_char () {
+ sc=$1
+ output=$2
+ description=$3
+ device=$4
+ printf 'checking conversion of \%s to %s (%s) on device %s' \
+ "$sc" "$output" "$description" "$device"
+ if ! printf "\\X#\\%s %s#\n" "$sc" "$desc" | "$groff" -T$device -Z \
+ | grep -Fqx 'x X '$output' '
+ then
+ printf '...failed'
+ fail=yes
+ fi
+ printf '\n'
+}
+
+for device in utf8 html
+do
+ check_char - - "minus sign" $device
+ check_char '[aq]' "'" "neutral apostrophe" $device
+ check_char '[dq]' '"' "double quote" $device
+ check_char '[ga]' '`' "grave accent" $device
+ check_char '[ha]' ^ "caret/hat" $device
+ check_char '[rs]' '\' "reverse solidus/backslash" $device
+ check_char '[ti]' '~' "tilde" $device
+done
+
+test -z "$fail" || exit 1
+
+# vim:set autoindent expandtab shiftwidth=2 tabstop=2 textwidth=72:
diff --git a/src/roff/troff/input.cpp b/src/roff/troff/input.cpp
index 7f31f9e..23748c2 100644
--- a/src/roff/troff/input.cpp
+++ b/src/roff/troff/input.cpp
@@ -5397,25 +5397,17 @@ static node *do_non_interpreted()
static void encode_char(macro *mac, char c)
{
if (c == '\0') {
- if ((font::use_charnames_in_special) && tok.is_special()) {
- charinfo *ci = tok.get_char(true /* required */);
- const char *s = ci->get_symbol()->contents();
- if (s[0] != (char)0) {
- mac->append('\\');
- mac->append('[');
- int i = 0;
- while (s[i] != (char)0) {
- mac->append(s[i]);
- i++;
- }
- mac->append(']');
- }
- }
- else if (tok.is_stretchable_space()
+ if (tok.is_stretchable_space()
|| tok.is_unstretchable_space())
mac->append(' ');
else if (tok.is_special()) {
- const char *sc = tok.get_char()->get_symbol()->contents();
+ const char *sc;
+ if (font::use_charnames_in_special) {
+ charinfo *ci = tok.get_char(true /* required */);
+ sc = ci->get_symbol()->contents();
+ }
+ else
+ sc = tok.get_char()->get_symbol()->contents();
if (strcmp("-", sc) == 0)
mac->append('-');
else if (strcmp("aq", sc) == 0)
@@ -5430,9 +5422,23 @@ static void encode_char(macro *mac, char c)
mac->append('\\');
else if (strcmp("ti", sc) == 0)
mac->append('~');
- else
- error("special character '%1' cannot be used within device"
- " control escape sequence", sc);
+ else {
+ if (font::use_charnames_in_special) {
+ if (sc[0] != (char)0) {
+ mac->append('\\');
+ mac->append('[');
+ int i = 0;
+ while (sc[i] != (char)0) {
+ mac->append(sc[i]);
+ i++;
+ }
+ mac->append(']');
+ }
+ else
+ error("special character '%1' cannot be used within"
+ " device control escape sequence", sc);
+ }
+ }
}
else if (!(tok.is_hyphen_indicator()
|| tok.is_dummy()
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [groff] 08/10: [troff]: Fix Savannah #61401.,
G. Branden Robinson <=