From 682ba81d314497014f72f26a8c73ff4505a6eee9 Mon Sep 17 00:00:00 2001 From: "G. Branden Robinson" Date: Sat, 4 Apr 2020 05:14:09 +1100 Subject: [PATCH] Support 2-digit \sNN only in compatibility mode. * src/roff/troff/input.cpp (read_size): Move special-case interpretation of single-digit point-size escapes with two digits to compatibility mode (groff -C) only, and throw error diagnostic with suggestion for remedy if encountered. The problem is that traditionally '\s36A' is interpreted as "set point size to 36, then emit 'A'". However, only values in the range 10-39 are handled specially; '\s40A' is interpreted as a four-point "0A". This is unlike anything else in *roff grammar; see \*, \$, \f, \F, \g, \k, \m, \M, \n, \V, and \Y. To anticipate objections: Why not throw only a warning? Because there isn't a warning category for supported but ambiguous syntax (this behavior of AT&T troff dates to 1976 but apparently was not documented until 1992). Why not throw the warning outside of compatibility mode too? Because outside of compatibility mode we (now) have an unambiguous parse. * NEWS: Advise users of behavior change and offer guidance. * doc/groff.texi: * man/groff.7.man: Document the restriction of special handling of \s10 through \s39 to compatibility mode. * src/roff/groff/groff.am: * src/roff/groff/tests/use_point_size_escape_with_single_digit_arg.sh: Add regression test. --- ChangeLog | 31 ++++++++++++++++ NEWS | 23 ++++++++++++ doc/groff.texi | 9 +++-- man/groff.7.man | 12 +++---- src/roff/groff/groff.am | 3 +- ...point_size_escape_with_single_digit_arg.sh | 36 +++++++++++++++++++ src/roff/troff/input.cpp | 19 +++++++++- 7 files changed, 123 insertions(+), 10 deletions(-) create mode 100755 src/roff/groff/tests/use_point_size_escape_with_single_digit_arg.sh diff --git a/ChangeLog b/ChangeLog index ab971e99..41ed5b15 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,34 @@ +2020-04-04 G. Branden Robinson + + * src/roff/troff/input.cpp (read_size): Move special-case + interpretation of single-digit point-size escapes with two + digits to compatibility mode (groff -C) only, and throw error + diagnostic with suggestion for remedy if encountered. + + The problem is that traditionally '\s36A' is interpreted as "set + point size to 36, then emit 'A'". However, only values in the + range 10-39 are handled specially; '\s40A' is interpreted as a + four-point "0A". This is unlike anything else in *roff grammar; + see \*, \$, \f, \F, \g, \k, \m, \M, \n, \V, and \Y. + + To anticipate objections: Why not throw only a warning? Because + there isn't a warning category for supported but ambiguous + syntax (this behavior of AT&T troff dates to 1976 but apparently + was not documented until 1992). Why not throw the warning + outside of compatibility mode too? Because outside of + compatibility mode we (now) have an unambiguous parse. + + * NEWS: Advise users of behavior change and offer guidance. + + * doc/groff.texi: + * man/groff.7.man: Document the restriction of special handling + of \s10 through \s39 to compatibility mode. + + * src/roff/groff/groff.am: + * src/roff/groff/tests/\ + use_point_size_escape_with_single_digit_arg.sh: Add regression + test. + 2020-04-04 G. Branden Robinson Improve point-size escape diagnostics. diff --git a/NEWS b/NEWS index ceec7195..c7b5ad74 100644 --- a/NEWS +++ b/NEWS @@ -13,6 +13,29 @@ VERSION 1.22.5 Troff ----- +o Point-size escapes of the form '\sNN', where NN is in the range 10-39, are + now recognized only in compatibility mode (groff -C). In normal mode, \sN + is interpreted as setting the point-size to the single digit value N, + which ends the escape. This change eliminates an ambiguity in the + language grammar that dates back to 1976 (AT&T troff by Ossanna) but + apparently was not documented until 1992 when Kernighan updated CSTR #54 + for AT&T ditroff. + + The form '\s(NN' is accepted for two-digit point sizes in all known + troffs. The form '\s[NNN]' accepts a numeric expression of any length, + has been supported by groff since 1.02 (June 1991) as well as by recent + versions of Heirloom troff and neatroff. The form "\s'NNN'" is also + widely supported. + + Short version: in your documents, rewrite "\s36" as + \s(36 + \s[36] + \s'36' + for values of "36" from 10 to 39, inclusive. You can use + grep '\\s[123][0-9]' + to find instances in your documents; those who have changed the escape + character with the .ec request are expected to be able to cope. + o New requests 'stringdown' and 'stringup' are available. These change the string named in their argument by replacing each of its bytes with its lowercase or uppercase version (if any), respectively. groff special diff --git a/doc/groff.texi b/doc/groff.texi index 65eea202..ae6ab98d 100644 --- a/doc/groff.texi +++ b/doc/groff.texi @@ -10051,8 +10051,13 @@ and the text begins. Any of the following forms are valid: @table @code @item \s@var{n} -Set the point size to @var{n}@tie{}points. @var{n}@tie{}must be either -0 or in the range 4 to@tie{}39. +Set the point size to @var{n}@tie{}points. @var{n}@tie{}must be a +single digit. + +In compatibility mode only, @var{n}@tie{}may also be a two-digit value +in the range range 10 to@tie{}39. Legacy documents relying upon this +quirk of parsing should be migrated to the \s(@var{nn} or \s[@var{n}] +forms. @item \s+@var{n} @itemx \s-@var{n} diff --git a/man/groff.7.man b/man/groff.7.man index e1d6a25a..8af99e55 100644 --- a/man/groff.7.man +++ b/man/groff.7.man @@ -3299,13 +3299,13 @@ The same as .ESC s \[+-]N Set/increase/decrease the point size to/by .I N -scaled points; +scaled points. +. .I N -must be either 0 -(restore previous size) -or, -for historical reasons, -in the range 4\[en]39. +must be a single digit, +except in compatibility mode +(where values from 10\[en]39 are also accepted); +0 restores the previous point size. . Otherwise, same as diff --git a/src/roff/groff/groff.am b/src/roff/groff/groff.am index c56b9562..fae28aa5 100644 --- a/src/roff/groff/groff.am +++ b/src/roff/groff/groff.am @@ -46,7 +46,8 @@ groff_TESTS = \ src/roff/groff/tests/regression_savannah_56555.sh \ src/roff/groff/tests/string_case_xform_errors.sh \ src/roff/groff/tests/string_case_xform_requests.sh \ - src/roff/groff/tests/string_case_xform_unicode_escape.sh + src/roff/groff/tests/string_case_xform_unicode_escape.sh \ + src/roff/groff/tests/use_point_size_escape_with_single_digit_arg.sh TESTS += $(groff_TESTS) groff_XFAIL_TESTS = \ diff --git a/src/roff/groff/tests/use_point_size_escape_with_single_digit_arg.sh b/src/roff/groff/tests/use_point_size_escape_with_single_digit_arg.sh new file mode 100755 index 00000000..4dc90732 --- /dev/null +++ b/src/roff/groff/tests/use_point_size_escape_with_single_digit_arg.sh @@ -0,0 +1,36 @@ +#!/bin/sh +# +# Copyright (C) 2020 Free Software Foundation, Inc. +# +# This file is part of groff. +# +# groff is free software; you can redistribute it and/or modify it under +# the terms of the GNU General Public License as published by the Free +# Software Foundation, either version 3 of the License, or +# (at your option) any later version. +# +# groff is distributed in the hope that it will be useful, but WITHOUT ANY +# WARRANTY; without even the implied warranty of MERCHANTABILITY or +# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +# for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see . +# + +groff="${abs_top_builddir:-.}/test-groff" + +# The vertical space is so that the 36-point 'A' won't be truncated by +# the top of the page. That could be confusing and misleading to anyone +# who ever has to troubleshoot this test case. +DOC=".vs 10v +\s36A" + +# Verify that the idiosyncratic behavior of \sN is supported in +# compatibility mode... +echo "testing \s36A in compatiblity mode (36-point 'A')" >&2 +echo "$DOC" | "$groff" -C -Z | grep -qx 's36000' + +# ...and not in regular mode. +echo "testing \s36A in non-compatiblity mode (3-point '6A')" >&2 +echo "$DOC" | "$groff" -Z | grep -qx 's3000' diff --git a/src/roff/troff/input.cpp b/src/roff/troff/input.cpp index 6f35dadf..d21bd698 100644 --- a/src/roff/troff/input.cpp +++ b/src/roff/troff/input.cpp @@ -5076,13 +5076,30 @@ static int read_size(int *x) } else if (csdigit(c)) { val = c - '0'; - if (!inc && c != '0' && c < '4') { + if (compatible_flag && !inc && c != '0' && c < '4') { + // Note: Very special case! If we have \s followed immediately by + // a digit (not '(', '+', or '-'), and that digit is 1, 2, or + // 3...read another digit! This is because the Graphic Systems + // C/A/T phototypesetter (the original device target for AT&T + // troff) only supported a few discrete point sizes in the range + // 6..36. Kernighan warned of this in the 1992 revision of CSTR + // #54 (section 2.3), and more recently, McIlroy referred to it as + // a "living fossil". This DWIM syntax is surprising to the user; + // it clashes with the syntax of several other escapes (\*, \$, + // \f, \F, \g, \k, \m, \M, \n, \V, and \Y). We therefore support + // it only in compatibility mode. + // + // See: + // https://lists.gnu.org/archive/html/groff/2020-03/msg00054.html + // et seq. tok.next(); c = tok.ch(); if (!csdigit(c)) bad_digit = 1; else val = val*10 + (c - '0'); + error("ambiguous point-size escape; rewrite to use '\\s(%1'" + " or similar", val); } val *= sizescale; } -- 2.20.1