[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug #58962] Latin-1 NO-BREAK SPACE does not behave as documented
From: |
Dave |
Subject: |
[bug #58962] Latin-1 NO-BREAK SPACE does not behave as documented |
Date: |
Sat, 15 Aug 2020 13:25:32 -0400 (EDT) |
User-agent: |
Mozilla/5.0 (X11; Linux i686; rv:45.0) Gecko/20100101 Firefox/45.0 |
URL:
<https://savannah.gnu.org/bugs/?58962>
Summary: Latin-1 NO-BREAK SPACE does not behave as documented
Project: GNU troff
Submitted by: barx
Submitted on: Sat 15 Aug 2020 12:25:30 PM CDT
Category: None
Severity: 3 - Normal
Item Group: Incorrect behaviour
Status: None
Privacy: Public
Assigned to: None
Open/Closed: Open
Discussion Lock: Any
Planned Release: None
_______________________________________________________
Details:
(Another bug report spawned from the discovery process of bug #58930.)
Quoth groff_char(7): "the ISO latin1 _no-break space_ is mapped to `\~', the
stretchable space character."
An eminently sensible mapping. Oh, if only it were so.
In fact, the Latin-1 no-break space (character 160 decimal, A0 hex):
* behaves the same as "\ ", the nonstretchable nonbreaking space character
* matches neither "\ " nor "\~" in an output-equivalency conditional
Examining these in detail:
=== Behavior ===
Consider an input file with one instance of the string "<>", representing a
nonbreaking space. sed can convert this string to the various types of
nonbreaking space under consideration (the two escapes and the raw Latin-1
character), and the typeset results compared by seeing which ones produce
identical PostScript output.
$ cat t0
Lorem ipsum dolor sit amet, consectetur<>adipiscing elit, sed
do eiusmod tempor incididunt ut labore et dolore magna aliqua.
$ # Baseline test, for escapes expected to be different:
$ diff <(sed 's/<>/\\ /' t0 | groff) <(sed 's/<>/\\~/' t0 | groff) | wc
8 68 403
$ # Output expected to be the same based on what the docs say:
$ diff <(sed 's/<>/\\~/' t0 | groff) <(sed 's/<>/\xA0/' t0 | groff) | wc
8 68 403
$ # Output that turns out to be the same:
$ diff <(sed 's/<>/\\ /' t0 | groff) <(sed 's/<>/\xA0/' t0 | groff) | wc
0 0 0
$
I'm filing this as "Incorrect behavio[u]r" rather than "Documentation" because
I believe the documented behavior is more sensible than the actual behavior.
But that's a judgment call and open to debate.
=== Equivalency conditional ===
Either way, if Latin-1 A0 behaves the same as one of "\ " or "\~", the
output-equivalency conditional operator (rendered as 'XXX'YYY' in the info
manual, though a host of characters besides single quotes can be used) ought
to recognize this. But this operator claims the output of character A0 is
equivalent to neither one (first observed in comment #2 of the aforementioned
bug <http://savannah.gnu.org/bugs/?58930#comment2>).
$ printf ".if '\xA0'\~' .tm equal\n" | groff
$ printf ".if '\xA0'\ ' .tm equal\n" | groff
$
(Granted, the documentation muddies what this operator is actually testing.
The info manual is clear about 'XXX'YYY', saying this is "True if the output
produced by XXX is equal to the output produced by YYY." But groff(7) is less
clear, saying that the test 's1's2' is "True if string s1 is identical to
string s2," which implies it's comparing _input_ strings. Were that the case,
you'd expect both the above tests to be false... but you'd also expect
'\[em]'\[u2014]' to be false, which it isn't.)
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?58962>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [bug #58962] Latin-1 NO-BREAK SPACE does not behave as documented,
Dave <=