groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Special characters


From: H.Merijn Brand
Subject: Re: Special characters
Date: Fri, 22 Sep 2023 08:43:37 +0200

On Thu, 21 Sep 2023 18:51:42 +0000, Lennart Jablonka <humm@ljabl.com> wrote:

Thanks for your elaborate answer. I'll try to expand …

> > $ nroff -man     < doc/CSV_XS.3 > doc/CSV_XS.man
> >
> > or
> >
> > $ nroff -mandoc  < doc/CSV_XS.3 > doc/CSV_XS.man
> >
> > Used to render up to and including groff-1.22
> >
> >  $row\->[2] =~ m/pattern/ or next;
> >
> > as
> >
> >  $row->[2] =~ m/pattern/ or next;
> >
> > but as of 1.23, generates
> >
> >  $row->[2] =˜ m/pattern/ or next;
> >
> > which renders all manual pages useless  
> 
> If you aren’t careful, you could evoke the impression your hyperbole
> is to be taken literally.

:)

> > I download the source code to read "NEWS", and I think the relevant
> > section is
> > --8<---
> > ..snipped..
> >-->8---  
> >
> > I wondered for longer why I get those silly and unwanted quotes and
> > dashes, but I am now faced with projects that generate useless files,
> > so I had to complete that task on a system that still had the sane 1.22
> > and get the results from there.  
> 
> NEWS was right:  Glyph usage errors in man pages are exposed.

And my wording was too rigid. Even if *I* do feel these translations
are silly, they will have value to others, otherwise the changes would
not have been decided to.

> > I bet your goals are laudable, but to me readable man-pages and
> > cut-n-paste ready output is way more valuable than visual changes that
> > I happen to not like in the first place however correct that might be.
> >
> > In order to enable users like me to disable that, I sincerely hope that
> > the next release of nroff/groff will support an environment or RC
> > option to disable these translations.  
> 
> In the man page, you can use the correct characters;  for the output
> for bad man pages, you can put translations in m{an,doc}.local.

Are there examples?

> > And I mean everywhere. I really
> > want "don't" to show as "don't" and not as "don’t", "~" as "~" and
> > not as "˜", and "-" as "-" and not as "‐" and probably many more
> > insane unwanted "intelligent" replacements.  
> 
> If I understand you correctly, you want to replace all non-ASCII
> glyphs by ASCII approximations.

No, reading this mail and the answer from Deri, I think I need to
expand a little here.

• I personally do not care about Unicode quotes, dashes, and other
  special tokens for where it has to do with English text.

• I however *do* care about special characters that are explicitly
  intended, like bullets, currency indicators, and Unicode glyphs
  in names like Żáïłēñőŗ

• My documentation is written in .pod or .md and then translated

  $ pod2markdown  < CSV_XS.pm    > doc/CSV_XS.md
  $ pod2html      < CSV_XS.pm    > doc/CSV_XS.html
  $ pod2man       < CSV_XS.pm    > doc/CSV_XS.3
  $ nroff -mandoc < doc/CSV_XS.3 > doc/CSV_XS.man

  That last line now rewritten to use a perl filter to match my needs

  $ nroff2man     < doc/CSV_XS.3 > doc/CSV_XS.man

  This implies that I have no control (yet) over what ends up in the
  CSV_XS.3 file

> I’m inclined to agree: That sounds like something quite a few people
   |
  I'm is what *I* would like to see

Those special quotes are also used in error messages nowadays (wget,
gcc, ...) and those also cause extra work, as they are not recognized
in double-click actions in e.g. xterm, where the ascii alternatives
are, see e.g. some example lines in ~/.Xdefaults for average Joe:

 XTerm*on2Clicks:                regex [^ \n*\043\047`|@#:;& ]+
 XTerm*on3Clicks:                regex [^ \n*\043\047`]+
 XTerm*on4Clicks:                regex [^ \n]+
 XTerm*on5Clicks:                line

To take special quotes in those attributes, many people will have to do
a lot of work, and some older tools do not even support it.

> might want for different reasons, so it would be good if we had an
> option for that.

I know about -Tascii (or GROFF_TYPESETTER=ascii), but that will also
disable bullets and stuff. I just don't want any translations that are
now default but not explicit in the source. As I do not know the full
range of which characters are changed to which other characters, why
and under what criteria, it is hard to exactly tell you what I
personally would like to enable or disable.

This has nothing to do with underlining, boldfacing, and colors.

> > I want an environment and not an option, because I personally do not
> > want these translations ever (unless I do, and then an option would
> > be appropriate).  
> 
> I think you might want a device.   Like devutf8, but for ASCII.   We
> could call it devascii.   You’d be able to invoke nroff, instead of
                               ^ same here. I want the plain '
> with -Tutf8, with -Tascii.

-Tdev ? (-Tascii also disables many other features)

> > Sorry if I sound harsh, but this change already cost me hours of work
> > to fix.  
> 
> I’m afraid my hope is that the man pages’ author will pour in the
> work to fix the man pages.

That will be the translator tools. As said, I write pod or markdown.
There is a complete snakepit in that toolchain if the source code
contains actual UTF-8, as it is likely that part of that is lost or
b0rked along the way.

As a perl5 developer, I have tons of sources of more than 100 versions
of perl and only the pod files add up to 231920 lines of documentation.
That excludes the pod documentation inside the pm files which probably
runs into several million lines of documentation.

> > I am on the brink of getting the 1.22 source code and install that
> > over the system package of 1.23, which just brings me hate.  
> 
> That would be sad, seeing as 1.23 has so many improvements in its
> documentation that make it easier for the reader to grasp good
> practice.

Documentation++

On Thu, 21 Sep 2023 19:08:02 +0100, Deri <deri@chuzzlewit.myzen.co.uk> wrote:

> > I am on the brink of getting the 1.22 source code and install that over
> > the system package of 1.23, which just brings me hate.  
> 
> I have added this line to the man.local file
> (/usr/share/groff/site_tmac on my system) to restore the asciitilde
> behaviour.
> 
> .tr ~\(ti

It would be awesome if I could have the full list of potential
translations and the matching lines in the mandoc file, so I can choose
which ones to keep and which ones to disable

-- 
H.Merijn Brand  https://tux.nl   Perl Monger   http://amsterdam.pm.org/
using perl5.00307 .. 5.37        porting perl5 on HP-UX, AIX, and Linux
https://tux.nl/email.html http://qa.perl.org https://www.test-smoke.org
                           

Attachment: pgpTsRnnt6XY2.pgp
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]