groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Merging the new gropdf


From: Deri
Subject: Re: Merging the new gropdf
Date: Tue, 07 Nov 2023 17:33:06 +0000

On Monday, 6 November 2023 17:47:13 GMT G. Branden Robinson wrote:
> Hi Deri,
> 
> At 2023-11-04T13:21:21+0000, Deri wrote:
> > I was wondering how the merging of the new gropdf branch was going (I
> > think you very kindly offered to help merging with master). Recently
> > you have written on the list:-
> 
> Slowly.  I landed two small changes this weekend, but they're not things
> most folks are looking for.

Hi Branden,

Thanks for this.

> > "Also, when Deri James's gropdf improvements are merged for groff
> > 1.24, the file size of groff-man-pages.pdf should come _way_ down.".
> > 
> > If you can't help me, I am quite happy to "give it a go", but I
> > probably won't do it as well! Since I announced the new gropdf I have
> > had one bug report (which is fixed), so it would be helpful to get
> > further testing from users who are using master.
> 
> I agree.  I've been trying to separate the changes into functional
> units, albeit not diced as finely as would for changesets of my own.
> 
> I've been facing a few challenges:
> 
> * The sheer magnitude of changes to gropdf.pl itself, and my own
>   ignorance of details of PDF that the new functionality is
>   implementing.

There are two main areas of change. The first is rectifying my design mistake 
in the original gropdf. It used the "t" command from groff as the primary 
command as a series of input characters which would be converted to postscript 
glyphs, all other text commands (for example "c") were converted back to their 
input character and treated as a single character "t" command. I was focussed 
on the groff font rather than the postscript font.

While thinking about font subsetting it became clear it made more sense to 
convert all input to postscript glyph names immediately, and use them as the 
"common currency" rather than focus on words. This particularly makes sense 
when dealing with non-latin input which has been processed with preconv. It is 
also makes it much more natural when dealing with font subsetting. Previously 
this was not necessary because the whole font was embedded by gropdf.

The second major change is the addition of a type 1 font parser and code to 
generate a font which only contains the glyphs required by the document being 
processed. This is the area which needs the most testing. I have tested with 
dozens of fonts that this parser is robust enough, but there are thousands of 
fonts out there. It seems to be happy with fonts produced by fontforge, which 
is promising.

> 
> * An uneasiness I feel about some of the solutions you adopted insofar
>   as they have effect outside of gropdf.pl itself.  For instance:
> 
>   1.  Changing the format of font description files to add yet another
>       field, mapping character names to Unicode code points.  In the
>       rest of groff, this is not necessary because we have glyphuni.cpp.
> 
>      
> https://git.savannah.gnu.org/cgit/groff.git/tree/src/libs/libgroff/glyphuni
> .cpp
> 
>       I'd like to honor the DRY principle here.  What's a good way to
>       achieve that?

I'm sure you have noticed that glyphuni.cpp has 433 mappings from groff 
character names to unicodes, and afmtodit has 4089 mappings between postscript 
glyph names and unicodes (the mappings are also algorithmically generated 
rather than manually hard coded as in the case of glyphuni.cpp). The mapping 
from postscript glyph to unicode is more appropriate for gropdf, the mapping 
from groff character names is meant for the input side of groff, where 
"glyph_name_to_unicode" is used solely by input.cpp. The use of the word glyph 
in the subroutine name is a bit confusing since it gives the false impression 
it is concerned with postscript glyphs.

I could add the mapping tables from afmtodit to the gropdf code and drop the 
new column from the groff fonts created by afmtodit, but my understanding of 
DRY principles is to avoid such duplication, have I got it wrong?

When I looked at the code for the rest of groff, adding an extra column had no 
effect on code which processed the groff font files.

>   2.  I don't know the provenance of a new font you have proposed for
>       shipping with groff, StandardSymSL.pfb.  We need to make sure it
>       is freely licensed.  If it is mechanically generated from a
>       PostScript Type 1 font that we can expect to find on the system,
>       maybe we should perform that procedure during the build.  (On the
>       other hand, I'm not sure I love the idea of adding a
>       build-dependency on fontforge or similar.)

This is the font which you said:-

"Looks great! It's not led astray by the superscripts or anything."

In this message to the list:-

https://lists.gnu.org/archive/html/groff/2023-06/msg00114.html

I described the provenance of the font. I don't think it is a good idea to 
generate the font and introduce a dependency on fontforge. I would put it in a 
similar category to the Euro font we currently distribute, i.e. a font to make 
everything work smoothly.

>   3.  The new `stringhex` request you've proposed for troff.  As noted
>       elsewhere, I'd prefer to solve this a different way.
> 
>       https://savannah.gnu.org/bugs/index.php?63074
> 
>       ...but I haven't implemented my idea yet, so I don't object to
>       `stringhex` as a temporary measure.
> 

I'm happy to drop stringhex for a better solution, if it handles the problem 
in this bug and the problems in these bugs:-

https://savannah.gnu.org/bugs/?62264
https://savannah.gnu.org/bugs/?64576

> * There were _tons_ of seemingly unrelated whitespace changes to
>   gropdf.pl, which frustrates code review.  (This has happened before; I
>   don't remember when, but Dave might.)  I went through the file and
>   attempted to impose a consistent style on it, but I'm not sure how
>   you'll feel about it.  More importantly, it would be helpful to get
>   your text editor to do better here.

I have attached an html version of how gropdf looks on my system. Tabs are 
spaced at 8 character intervals, indents are 4 characters. 

> Also your most recent commit to your branch says that it's starting work
> on a new thing.  Should I omit that from my merge?
> 
> https://git.savannah.gnu.org/cgit/groff.git/commit/?h=deri-gropdf-ng&id=a2b5
> 541142a1571e9f9f5a8321c1e21c721469aa

Yes, please drop this. It is my next project, text decorations. Peter asked 
for underlining a long time ago. Mom postscript has a nifty piece of 
postscript code (courtesy of Tadziu I think) which underlines text. In PDFs 
underline is one of the text decorations, so I'm hoping to expose an API for 
text decoration as the next project.

> I'm attaching a "git log -p --format=fuller" of my staging branch so you
> can see where I am.
> 
> I look forward to hearing your thoughts on next steps.

Keep going. :-) 

Cheers

Deri

> Regards,
> Branden

Attachment: gropdf.html
Description: application/xhtml


reply via email to

[Prev in Thread] Current Thread [Next in Thread]