groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: CHECKSTYLE suggestions: unnecessary quotations and unnecessary \f es


From: Alejandro Colomar (man-pages)
Subject: Re: CHECKSTYLE suggestions: unnecessary quotations and unnecessary \f escape
Date: Sun, 20 Mar 2022 18:25:28 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.6.2

Hi, Branden!

On 3/20/22 15:07, G. Branden Robinson wrote:
> Hi, Alex!
> 
> At 2022-03-19T17:07:09+0100, Alejandro Colomar (man-pages) wrote:
>> While fixing style issues in the man-pages project,
>> I'm finding a few recurrent issues that I think you could warn about:
>>
>> Unnecessary quotations:
>>
>> [
>> .I "foo bar"
>> .IR foo "bar"
>> ]
> 
> That is going to be hard to detect from within a macro package.  As
> noted in our recent discussion of quotation marks in macro calls, by the
> time these arguments get to the `I` and `IR` macros, those macros have
> no way of knowing of they were excessively quoted in the calling
> context.
> 
> I don't have a solution for this problem.  To solve it would require
> modifying GNU troff's input parser to track some kind of "extraneous
> quote" state.  Since as we saw in our earlier discussion, a sequence of
> up to four double quotes can be perfectly valid, my intuition is that
> this problem is worse than regex-hard, and the cost might rapidly
> outweigh the benefit.

Hmmmm, I see; worse than regex-hard! :D

> 
> If you need this, it's probably better to just write a regex-based tool
> that scans the man page source.  You can then enforce a stricter
> discipline, permitting false positives on valid but unusual constructs
> that would be better recast.

Yup.  Something like checkpatch.pl.  But I'd need to learn perl...  I'll
defer that problem to my future self.  Maybe finding most cases with
simple regexes from time to time is easier.  Probably /[^\s]"[^\s]/ is
good enough for most cases.  /""/ would also help.

> 
>> Unnecessary escape \f:
>>
>> [
>> foo \fIbar\fP baz
>> ]
>>
>> The last one is more difficult to decide when it's unnecessary, but
>> you could maybe start with non-formatted lines.
> 
> This is also a big challenge, and on my first reflection, even worse, as
> you suspect.  The problem is that what you quote is an ordinary text
> line, and *roffs don't generally look very far ahead when parsing.
> There aren't many ways in the language to peek ahead in the input
> stream.
> 
> The only ways I can think of would be to set up the macro package such
> that all text lines get captured into a macro or diversion.  You might
> then be able to iterate through the stored content somehow--though I
> don't know off the top of my head a way to do this line by line.  I also
> don't know how to do something like save some kind of pending input line
> into a string for processing with the few simple requests we have for
> that.  There's also the problem of interpreting that input well enough
> to recognize undesirable constructs--do you want to write a troff in
> troff?

:)

I sometimes wonder how cc(1) handles all those seemingly impossible
constructs while keeping sane warnings.  I understand how sometimes you
miss a warning just because the relevant code got optimized at a
different compilation stage, and it simply isn't there to be warned
about anymore.  And that's for a language as simple as C.  I can't
imagine what happens with C++... or maybe I do: that's why warnings are
helpless there, and only intuition can help.

> 
> Again I would attack this with a less perfect but much more tractable
> regex-based input scanner.  I would filter out tbl(1) regions and then
> flag _any_ font selection escape sequence that isn't on a control line,
> meaning a line starting with '.' (that's an over-crudification[1], but I
> predict that it will work well for most pages.  I'm attaching a shell
> script I've come up with do this.  For groff's own pages, it mostly
> turns up use of non-man(7)-standard fonts (not roman, bold, or italic)
> and some pages I haven't yet done a thorough revision on.
> 
> Regards,
> Branden
> 
> [1] no-break control character, line continuation, yadda yadda yadda

Thanks,

Alex

-- 
Alejandro Colomar
Linux man-pages comaintainer; https://www.kernel.org/doc/man-pages/
http://www.alejandro-colomar.es/



reply via email to

[Prev in Thread] Current Thread [Next in Thread]