bug-groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #65108] [troff] support construction of general file name request a


From: G. Branden Robinson
Subject: [bug #65108] [troff] support construction of general file name request arguments
Date: Thu, 18 Jul 2024 17:54:14 -0400 (EDT)

Update of bug #65108 (group groff):

                  Status:                    None => Need Info              
             Assigned to:                    None => barx                   

    _______________________________________________________

Follow-up Comment #3:

Well, let's rough out a syntax that would work both for existing uses of `so`
as _soelim_(1) understands it and for formatter syntax, which interprets the
`so` under slightly different rules (since it brings to bear the full power of
the _troff_ lexical analyzer).

1.  An argument of type `file` (as described in _groff_(7)) to a request
consumes the rest of the rest of the line.
2.  Unescaped spaces can therefore populate the argument.
3.  A leading double quote is recognized and removed; a file name can thus
start with spaces.
4.  Any other/remaining double quotes are not treated specially.
5.  Only the following escape sequences are recognized.

5a. `\ ` (backslash-space) represents a space.  It is not necessary in
_troff_, but is recognized to avoid disrupting existing _soelim_(1) usage.
5b. `\"` ends the file name argument and starts a comment.
5c. `\\` represents a (single) literal backslash.  It is handled however the
system's standard C library wants to handle it.
5d. `\[u00XX]` where each X is an uppercase hexadecimal digit encodes a
character.  Only codes in the range 00-1F and 80-FF are accepted in this
syntax; those in the range 20-7F are ignored with a diagnostic advising the
user to deobfuscate their inputs.

How are these handled today?

Specimen:


$ cat EXPERIMENTS/extending-so-syntax.troff
.so foo bar file.troff
.so foo\ bar\ file.troff
.so "foo bar file.troff
.so foo.troff\" comment
.so foo\u[0020]bar\u[0020]file.troff


_groff_ _soelim_:


$ soelim EXPERIMENTS/extending-so-syntax.troff
.lf 1 ./EXPERIMENTS/extending-so-syntax.troff
soelim:./EXPERIMENTS/extending-so-syntax.troff:1: error: can't open 'foo': No
such file or directory
.so foo bar file.troff
soelim:./EXPERIMENTS/extending-so-syntax.troff:2: error: can't open 'foo bar
file.troff': No such file or directory
.so foo\ bar\ file.troff
soelim:./EXPERIMENTS/extending-so-syntax.troff:3: error: can't open '"foo': No
such file or directory
.so "foo bar file.troff
.so foo.troff\" comment
.so foo\u[0020]bar\u[0020]file.troff


DWB 3.3 _soelim_:

...never mind, DWB 3.3 _troff_ *has* no _soelim_.  Wow!  Learned something new
today.

Heirloom Doctools _soelim_:


$ ./bin/soelim ./extending-so-syntax.troff 
foo: No such file or directory
.so foo
bar file.troff
foo\: No such file or directory
.so foo\
bar\ file.troff
"foo: No such file or directory
.so "foo
bar file.troff
foo.troff\": No such file or directory
.so foo.troff\"
comment
foo\u[0020]bar\u[0020]file.troff: No such file or directory
.so foo\u[0020]bar\u[0020]file.troff


Uh, that's a little hard to interpret.


$ printf '.so foo bar file.troff\n' | ./bin/soelim 
foo: No such file or directory
.so foo
bar file.troff


Interesting that it transforms the input in this way, by adding a newline
where it decided to stop lexing the file name.  I'm tempted to call that a
bug.


0000000   .   s   o       f   o   o  \n   b   a   r       f   i   l   e
0000020   .   t   r   o   f   f  \n
0000026


The other cases:


$ printf '.so foo\\ bar\\ file.troff\n' | ./bin/soelim 
foo\: No such file or directory
.so foo\
bar\ file.troff

$ printf '.so "foo bar file.troff\n' | ./bin/soelim 
"foo: No such file or directory
.so "foo
bar file.troff

$ printf '.so "foo.troff\\"comment\n' | ./bin/soelim 
"foo.troff\"comment: No such file or directory
.so "foo.troff\"comment

$ printf '.so foo\u[0020]bar\u[0020]file.troff\n' | ./bin/soelim 
printf '.so foo\\u[0020]bar\\u[0020]file.troff\n' | ./bin/soelim
foo\u[0020]bar\u[0020]file.troff: No such file or directory
.so foo\u[0020]bar\u[0020]file.troff


There seem to be no further surprises here.

Unix V7 did not have _soelim_, either.

Let me check Solaris 10.


$ printf '.so foo\\ bar\\ file.troff\n' | soelim
foo\: No such file or directory
.so foo\
bar\ file.troff

$ printf '.so "foo bar file.troff\n' |soelim
"foo: No such file or directory
.so "foo
bar file.troff

$ printf '.so "foo.troff\\"comment\n' |soelim
"foo.troff\"comment: No such file or directory
.so "foo.troff\"comment

$ printf '.so foo\u[0020]bar\u[0020]file.troff\n' |soelim
foo\u[0020]bar\u[0020]file.troff: No such file or directory
.so foo\u[0020]bar\u[0020]file.troff


These look identical to Heirloom to me.  I guess we know now where Heirloom
got its inspiration, and perhaps even code, for _soelim_ from.

Since backslash-space is apparently a GNU extension in the first place, we
might consider dropping it.  It wasn't portable, and even the rest of the
_groff_ ecosystem struggled to handle files with spaces in their names.

I further venture that this exact same syntax could be applied to the
`sy`/`pso` problem in bug #62787 and to user-constructed diagnostic messages
in bug #64071.

I highly value the prospect of having a parallel syntax for these 3 issues if
we can get it.

For _soelim_(1) itself I would further add that this program will continue to
recognize only backslash as an escape character, but GNU _troff_ will
recognize the configured escape character.

Thoughts?


    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?65108>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]