groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [groff] [PATCH] new requests to case-transform string register value


From: Ingo Schwarze
Subject: Re: [groff] [PATCH] new requests to case-transform string register values
Date: Sat, 13 Jul 2019 20:59:14 +0200
User-agent: Mutt/1.8.0 (2017-02-23)

Hi Branden,

G. Branden Robinson wrote on Thu, Jul 04, 2019 at 01:52:42AM +1000:

> A recurring theme in my man page clean-up work has been my violent
> antipathy for shouting capitals in their texts.  While I don't _like_
> being shouted at for reasons other than true emergency, the real problem
> with the capitalization convention in man pages is that it happens at
> the input source, destroying information (case distinctions) that is
> unrecoverable by the typesetting system later.
> 
> Here is the last time we discussed the issue:
> https://lists.gnu.org/archive/html/groff/2018-12/msg00141.html
> 
> The consensus seemed to be that pushing case-transformation
> functionality down into language would be worth trying.
> 
> So, here's an implementation.  Comments welcome.
> 
> I expect some bikeshedding on the names of the requests.  I'm not wedded
> to the ones I have; my main criterion is:
> 
> * The new request names should collate adjacently in the existing
>   request namespace.  E.g., "stringup" and "stringdn" are a much better
>   pair than "upstring" and "dnstring".  If someone is looking for one of
>   them, it's not going to be long before they wonder what/where the
>   other one is.

Your suggested naming seems good to me.  It is descriptive without
being excessively long.  As far as i can see, it harmonizes with
the naming of existing requests.  In particular, the proposed
syntax, semantics, and naming as similar enough to the existing
.substring request that the similarity of the name makes sense.

Some minor comments inline:

> diff --git a/man/groff.7.man b/man/groff.7.man
> index 24f093fe..f0e02b6c 100644
> --- a/man/groff.7.man
> +++ b/man/groff.7.man
> @@ -2384,6 +2384,18 @@ and sentence space size set to
>  of the space width in the current font.
>  .
>  .TPx
> +.REQ .stringdown "stringvar"
> +Transform each letter in\~\c
> +.I stringvar
> +to its lowercase version.

This feels unclear in three respects: one might understand that

  .stringdown Word

would put "word" into the output document, where actually, the
user-defined string called "Word" is modified in place and nothing
is put into the output, and one might wrongly expect that multi-byte
characters are somehow handled with towlower(3).  I think this would
be clearer:

  Change the string named
  .IR stringvar
  by replacing each byte with its lower-case version.

The word "change" is needed to make it clear that the replacement
happens in place rather than writing something to the output,
the word "named" is needed to make it clear that the argument is
not the string that will be processed, and the word "byte" is
needed to make it clear that this only works for single-byte
characters.  That way, the surprising behaviour of changing \['e]
to \['E] is also implicitly specified.

Also, please avoid groffisms and escapes unless they are really needed:

  in\~\c
  .I stringvar

is better written as just:

  in
  .I stringvar

I see no reason to suppress a line break at this point, and even if so,

  .RI in\~ stringvar

would be better.

> +For best results,
> +use only the basic Latin alphabet (\[lq][A-Za-z]\[rq]) in your string
> +values;

Sounds misleading, what's wrong with using punctuation or digits?

Maybe it would be better to say something like:

  If the string contains an escape sequence, the sequence itself
  will be transformed, not the resulting output; for exmample,
  .stringup transforms \(or (bitwise or) to \(OR (logical or).

> diff --git a/src/roff/troff/input.cpp b/src/roff/troff/input.cpp
> index a1bd8eaf..e01308b9 100644
> --- a/src/roff/troff/input.cpp
> +++ b/src/roff/troff/input.cpp
> @@ -4717,6 +4717,59 @@ void chop_macro()
>    skip_line();
>  }
>  
> +enum case_xform_mode { STRING_UPCASE, STRING_DOWNCASE };
> +
> +// Case-transform each character of the string argument.

better: "each byte" and s/$/in place/.

> +void do_string_case_transform(case_xform_mode mode)
> +{
> +  if ((mode != STRING_DOWNCASE) && (mode != STRING_UPCASE)) {
> +    error("impossible string case transformation mode %1!", mode);

This cannot happen.  Consequently, shouldn't it be assert()
rather than error()?

> +    return;
> +  }
> +  symbol s = get_name(1);
> +  if (s.is_null()) {
> +    skip_line();
> +    return;
> +  }
> +  request_or_macro *p = lookup_request(s);
> +  macro *m = p->to_macro();
> +  if (!m) {
> +    error("cannot apply string case transformation to a request ('%1')",
> +          s.contents());
> +    skip_line();
> +    return;
> +  }
> +  string_iterator iter1(*m);
> +  macro *mac = new macro;
> +  for (int l = 0; l < m->macro::length(); l++) {
> +    int nc, c = iter1.get(0);
> +    if (c == PUSH_GROFF_MODE
> +        || c == PUSH_COMP_MODE
> +        || c == POP_GROFFCOMP_MODE)
> +      nc = c;
> +    else if (c == EOF)
> +      break;
> +    else
> +      if (mode == STRING_DOWNCASE)
> +     nc = tolower(c);

Indentation looks weird here.

> +      else
> +     nc = toupper(c);
> +    mac->append(nc);
> +  }
> +  request_dictionary.define(s, mac);
> +  tok.next();
> +}
> +
> +// Uppercase-transform each character of the string argument.

Better: Replace each byte by its upper-case version, in-place.

Yours,
  Ingo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]