[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [groff] [PATCH] new requests to case-transform string register value

From: G. Branden Robinson
Subject: Re: [groff] [PATCH] new requests to case-transform string register values
Date: Mon, 9 Sep 2019 03:03:54 +1000
User-agent: NeoMutt/20180716

At 2019-07-04T22:58:10+0200, Steffen Nurpmeso wrote:
> I am totally out still and for some time to come, but from looking
> at the code all i know and can do is to wonder how far you come
> with that 7-bit ASCII only toupper()/tolower().

I agree.  I was a little disappointed that after going to all that
trouble I can still reliably deal only with ASCII codepoints.  But there
is not yet any such thing as normalized groff input.  We can't be sure
that preconv has been run on our input.

I feel like my hands are tied until someone wants to undertake the
significant task of converting groff to use UTF-8 internally, which is a
TODO item I've seen noted in code comments.

A related task along those lines is to get preconv to recognize Emacs
coding tags that come in the more verbose form at the end of the file.
I think that's marked as TODO as well in preconv.cpp.

Once groff is UTF-8 internally we can then take advantage of existing
Unicode data files to get the case mappings of every defined codepoint,
and the .string{up,down} requests can become much more powerful.

In the meantime, if people sneak explicit Unicode escapes into string
registers and try to transform them, they will get diagnostics:

troff: <standard input>:5: warning: can't find special character 'U0065_0301'

This is good, in my opinion, because it doesn't mislead our users about
what the new requests can do at this point.

(In fact, the above just inspired me to write a third regression test
for the new feature.  I'll commit that shortly.)

I considered making my new code in input.cpp detect this situation, but
this would have meant adding another finite-state machine inside what is
already a recursive-descent parser.  It felt like needless complexity.
And what was I going to do once I detected the sequence '\[u'?  Throw a
diagnostic when the one above, already supported at no extra cost, will
serve almost as well?

Implementing this feature was educational; I had no idea how to add a
new request to groff before undertaking it, and I'm glad its modest
utility did not meet with objection.

Thanks for your feedback!


Attachment: signature.asc
Description: PGP signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]