groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Warn on mid-input line sentence endings


From: G. Branden Robinson
Subject: Re: Warn on mid-input line sentence endings
Date: Sun, 30 Apr 2023 07:34:57 -0500

At 2023-04-30T03:04:27+0200, Alejandro Colomar wrote:
> On 4/30/23 02:05, G. Branden Robinson wrote:
> > I should have said "_Warn on_ semantic newlines" is a terrible
> > instruction/summary.
> 
> That's why I used the phrase (at least I tried to do it consistently
> recently) "warn on S. N. violations".

Alas, it got lost in the most recent thread subject line on this topic
to the groff list...

https://lists.gnu.org/archive/html/groff/2023-04/msg00334.html

Hmm, I see that was Bjarni's doing.  Being from Iceland, he perhaps has
more of the spirit of Loki than most...

> > They are what we _don't_ want to warn about upon encountering them.
> > 
> > If man-pages(7) or other people continue to call the practice of
> > breaking *roff input lines after sentence-ending punctuation
> > "semantic newlines", I have no complaint.  It could also be called
> > "Kernighan breaking", in honor of an early popularizer of the
> > practice.
> 
> You could use it for the warning name ;).

Not a chance.  :P

As I noted, I want this under the "style" penumbra now, along with some
other bits of weirdness.

https://savannah.gnu.org/bugs/?62776

> > This is categorically not what regular expressions can cope with,
> > formally.
> 
> Well, formally yes.  And a regex can't find C function definitions in
> a source tree; at least if you try to fool it by writing the most
> horrible code in the universe.  But I wrote a relatively small
> script[1] that finds a lot of C code with pcre2grep(1), and works most
> of the time.  It has limitations; some of which can be fixed by
> improving the regexes (read: making them even more unreadable); some
> others are likely impossible to fix with a regex.  The biggest
> limitation I think I've met is K&R-style functions: I don't think a
> regex can cope with them.

I don't know if you have to cope with "the lexer hack", but you might.

https://en.wikipedia.org/wiki/Lexer_hack

How much grief might have been saved if objects in C had been prefixed
with a sigil like $, or if types had been prefixed with %?

In my imagination, Thompson vetoed this, but when I consider it more
seriously, I reckon the truth is more complicated, and arises from C's
origins in the wholly untyped B language.  The dialect of C we see in
Version 6 Unix (q.v. the Lions book) is shockingly loosely typed to
modern eyes.  I once ground the productivity of my workplace to a halt
for an entire afternoon by presenting my colleagues with the attached
exhibit of "legal C".  (It remained legal in AT&T USG Unix for many,
many years.)

> I believe a regex-based script can be good enough for some purposes,
> even if it's not perfect.

All of this is true, and I like programming languages that are dead
simple to lexically analyze.  (But I spend next to no time working in
them.)

I'm strident on this point because I'm opposed to putting a diagnostic
into the formatter that throws false positives.  That would disserve
users.

Regards,
Branden

Attachment: legal_c.jpeg
Description: legal_c.jpeg

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]