[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [groff] Regularize (sub)section cross references.
From: |
G. Branden Robinson |
Subject: |
Re: [groff] Regularize (sub)section cross references. |
Date: |
Mon, 17 Dec 2018 13:19:38 -0500 |
User-agent: |
NeoMutt/20180716 |
At 2018-12-17T18:49:31+0100, Tadziu Hoffmann wrote:
>
> > A 1-to-2 character mapping of course is beyond the ability of .tr.
>
> You can define a special character:
>
> .char \(SS SS
> .tr mMaAß\(SS
> Maß
>
> results in
>
> MASS
Nice! "Typesetting assembly language" once again proves its worth as a
description of Ossanna's roff.
Okay, so maybe this _could_ be rolled out without prerequisites in
1.22.5.
My Debian buster system has almost 1600 non-English man pages:
$ find /usr/share/man/!(man*) -type f -and -not -type l|wc -l
1597
Fortunately Chinese, Japanese, and Korean have no case distinctions,
leaving only about 1400 pages:
$ find /usr/share/man/!(man*|ja|ko|zh_*) -type f -and -not -type l|wc -l
1399
and fortunately, _none_ of these use the section-name-on-the-next-line
idiom:
$ zgrep '^\.[[:space:]]*SH$' $(find /usr/share/man/!(man*) -type f \
-and -not -type l) || echo NONE
NONE
So what do we see in the section headings we _do_ have? Suppressing
filenames, stripping double-quotes, and just counting occurrences, I
found 1,241 distinct section titles.
$ zgrep -h '^\.[[:space:]]*SH' \
$(find /usr/share/man/!(man*|ja|ko|zh_*) -type f \
-and -not -type l) \
| sed 's/"//g' | sort | uniq -c | wc -l
1241
Interestingly, some are already mixed-case. This could be true of
English pages as well but that's not the good I'm chasing right now.
The next step would be to bust these down character-by-character, but
that is slightly frustrated by the fact that some people enter their
non-ASCII codepoints as-is and other use (more portable and "correct")
character escapes to obtain them. Collating and counting these to find
a minimal set of characters to feed .tr requests is going to take a bit
more work.
The file of non-English section headings is attached for the curious. I
added a sort -nr to the above pipeline and removed the wc -l, of course.
Regards,
Branden
section_headers.txt
Description: Text document
signature.asc
Description: PGP signature
- Re: [groff] Regularize (sub)section cross references., (continued)
- Re: [groff] Regularize (sub)section cross references., Tadziu Hoffmann, 2018/12/16
- Re: [groff] Regularize (sub)section cross references., James K. Lowden, 2018/12/17
- Re: [groff] Regularize (sub)section cross references., G. Branden Robinson, 2018/12/16
- Re: [groff] Regularize (sub)section cross references., Ingo Schwarze, 2018/12/16
- Re: [groff] Regularize (sub)section cross references., Pierre-Jean Fichet, 2018/12/17
- Re: [groff] Regularize (sub)section cross references., John Gardner, 2018/12/17
- Re: [groff] Regularize (sub)section cross references., G. Branden Robinson, 2018/12/17
- Re: [groff] Regularize (sub)section cross references., John Gardner, 2018/12/17
- Re: [groff] Regularize (sub)section cross references., G. Branden Robinson, 2018/12/17
- Re: [groff] Regularize (sub)section cross references., Tadziu Hoffmann, 2018/12/17
- Re: [groff] Regularize (sub)section cross references.,
G. Branden Robinson <=
- Re: [groff] Regularize (sub)section cross references., Werner LEMBERG, 2018/12/17
- Re: [groff] Regularize (sub)section cross references., Ingo Schwarze, 2018/12/17
- Re: [groff] Regularize (sub)section cross references., John Gardner, 2018/12/17
- Re: [groff] Regularize (sub)section cross references., G. Branden Robinson, 2018/12/17
- Re: [groff] Regularize (sub)section cross references., Pierre-Jean Fichet, 2018/12/17
- Re: [groff] Regularize (sub)section cross references., Ingo Schwarze, 2018/12/17
- Re: [groff] Regularize (sub)section cross references., G. Branden Robinson, 2018/12/17
- Re: [groff] Regularize (sub)section cross references., Ingo Schwarze, 2018/12/17
- Re: [groff] Regularize (sub)section cross references., John Gardner, 2018/12/17
- Re: [groff] Regularize (sub)section cross references., Tadziu Hoffmann, 2018/12/17