groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Proposed: make \X read its argument in copy mode


From: G. Branden Robinson
Subject: Re: Proposed: make \X read its argument in copy mode
Date: Fri, 19 Jan 2024 18:56:34 -0600

Hi Deri,

At 2024-01-20T00:07:21+0000, Deri wrote:
> On Friday, 19 January 2024 21:39:57 GMT G. Branden Robinson wrote:
> > Right.  Before I craft a lengthy response to this--did you see the
> > footnote?
> 
> Yes, sorry, it didn't help. I'm just comparing output now with output
> in 1.23.0 and what you claim you are doing is the reverse of what I'm
> seeing.

I haven't yet pushed anything implementing my (new) intentions,
reflected in the subject line.  I wanted to gather feedback first.

What happened was, I thought "the `device` request and `\X` escape
sequence should behave the same, modulo the usual differences in parsing
(delimitation vs. reading the rest of the line, the leading double quote
mechanism in request form, and so forth)".

Historically, that has never been the case in groff.

Here's (the meat of) the actual test case I recently wrote and pushed.

input='.nf
\X#bogus1: esc \%man-beast\[u1F63C]\\[u1F00] -\[aq]\[dq]\[ga]\[ha]\[rs]\[ti]#
.device bogus1: req \%man-beast\[u1F63C]\\[u1F00] 
-\[aq]\[dq]\[ga]\[ha]\[rs]\[ti]
.ec @
@X#bogus2: esc @%man-beast@[u1F63C]@@[u1F00] -@[aq]@[dq]@[ga]@[ha]@[rs]@[ti]#
.device bogus2: req @%man-beast@[u1F63C]@@[u1F00] 
-@[aq]@[dq]@[ga]@[ha]@[rs]@[ti]'

I know that looks hairy as hell.  I'm testing several things.

Here is what the output of that test looks like on groff 1.22.3 and
1.22.4.

x X bogus1: esc man-beast\[u1F00] -
x X bogus1: req @%man-beast\[u1F63C]\[u1F00] -\[aq]\[dq]\[ga]\[ha]\[rs]\[ti]
x X bogus2: esc man-beast@[u1F00] -
x X bogus2: req @%man-beast@[u1F63C]@[u1F00] -@[aq]@[dq]@[ga]@[ha]@[rs]@[ti]

Observations of the above:

A.  When using `\X`, the escape sequences \%, \[u1F63c], \[aq], \[dq],
    \[ga], \[ha], \[rs], \[ti] all get discarded.

B.  When you change the escape character and self-quote it in the
    formatter, it comes out as-is in the device control command.  I
    found this absurd, since there is no such thing as an escape
    character in the device-independent output language, and whatever
    escaping convention a device-specific control command needs to come
    up with for things like, oh, expressing Unicode code points is
    necessarily independent of a random *roff document's choice of
    escape character anyway.

Here is what the test output looks like on groff 1.23.0.  It enabled a
few more characters to get rendered in PDF bookmarks.

x X bogus1: esc man-beast\[u1F00] -'"`^\~
x X bogus1: req @%man-beast\[u1F63C]\[u1F00] -\[aq]\[dq]\[ga]\[ha]\[rs]\[ti]
x X bogus2: esc man-beast@[u1F00] -'"`^\~
x X bogus2: req @%man-beast@[u1F63C]@[u1F00] -@[aq]@[dq]@[ga]@[ha]@[rs]@[ti]

Here is what the test output looks like on groff Git HEAD.  It was my
first stab at solving the problem, the one I am now having partial
second thoughts about.

x X bogus1: esc man-beast\[u1F00] -'"`^\~
x X bogus1: req man-beast\[u1F00] -'"`^\~
x X bogus2: esc man-beast\[u1F00] -'"`^\~
x X bogus2: req man-beast\[u1F00] -'"`^\~

I was briefly happy with this, but I started wondering what happens when
you interpolate any crazy old damned string inside a device control
command and I rapidly became uncomfortable.  Because `\X` does not read
its argument in copy mode, it can get exposed to "nodes" (and in groff
Git, `device` can too)--this is that old incomprehensible nemesis that
afflicted pdfmom users relentlessly before 1.23.0.[1][2][3][4][5][6]

        can't transparently output node at top level

But the reason 1.23.0 doesn't throw these errors is because I hid them,
not because we fixed them.[7]

An aim of this proposal is to truly fix them.

I hope it will surprise no one to learn that I have recently also
updated our documentation regarding tokens, nodes, how these relate to
GNU troff's input processing, and related matters.

> I hope I don't elicit a too lengthy response.

I know such hope oft seems forlorn when talking to me...

> There are 3 logical possibilities for the list to decide:-
> 
> 1) .device behaves like \X.
> 
> This seems to be what Branden has done at the moment. Disadvantage is
> that as a by-product you can't send unicode to the output drivers
> using either method,

I'm not happy with this status quo, but this doesn't exactly mean you
"can't send Unicode to output drivers".  What you have to do is _decide
upon an encoding mechanism for them_.  That will be true no matter which
way we solve this.  But I think it's best if there is _one_ way (per
output driver, anyway), not two different ones depending on whether your
encoded Unicode sequence is passed via `device` or `\X`.  This stuff is
challenging enough to the user that that seems like gratuitous cruelty.

Unfortunately that _has been_ the status quo.

> and some escapes affect the text stream when the expectation is for
> things sent to the output driver should not affect text stream.

Right.  That is what alarmed me about reading `device` and `\X`
arguments in interpretation mode.

> 2) \X behaves like .device.
> 
> This is what Branden said was the intention. This allows pdf title
> (normally shown in the window header in a pdf viewer) to use unicode.

This might be more accurately stated as:

2) \X behaves like .device used to (in groff 1.23.0 and earlier).

And I repeat: this is a _hard_ prerequisite to expressing Unicode
sequences in the output, but it seems like a useful so that authors of
output drivers (and supporting macro files for them) can keep their
sanity.

But making this happen means changing a CSTR #54 (1992) feature, not a
GNU extension, so I felt I didn't have any wiggle room, and that the
issue was best mooted on the list.

> 3) Leave things as they were prior to recent commits.

I'll be interested to see the argument from anyone who wants to defend
the groff 1.22.{3,4} test case exhibits above.

> It will be interesting to hear from as many people as possible which
> they think is the best option. I definitely think we should not be
> making the use of unicode harder.

Strongly agreed.

Regards,
Branden

[1] https://lists.gnu.org/archive/html/groff/2020-10/msg00130.html
[2] https://lists.gnu.org/archive/html/groff/2021-06/msg00050.html
[3] https://lists.gnu.org/archive/html/groff/2021-02/msg00007.html
[4] https://lists.gnu.org/archive/html/groff/2018-02/msg00024.html
[5] https://lists.gnu.org/archive/html/groff/2015-06/msg00033.html
[6] https://lists.gnu.org/archive/html/groff/2013-02/msg00002.html
[7] 
https://git.savannah.gnu.org/cgit/groff.git/commit/?id=557bc0558dfdee7e3f2011433cf4606052e4e7e1

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]