groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] Character class query


From: Werner LEMBERG
Subject: Re: [Groff] Character class query
Date: Thu, 05 Mar 2009 08:01:04 +0100 (CET)

> lacking other kinds of ranges, I think we could drop the colons.

Feel free to use any of the suggestions :-)

>> This should probably support fall-back classes too, similar to the
>> current mechanism for ordinary entities.
> 
> I've been trying to figure out a class analogue for this, and not
> getting very far.  [...]

After some thinking I believe that I've written nonsense :-) Right now
we are talking about input character properties which have nothing to
do with font selection -- whatever font is selected finally to display
a glyph, it inherits the input character properties.  Please ignore my
silly remark.

> I was instead envisaging a system in which a character can be in
> multiple classes.

Yes, probably directly (and mechanically) derived from the Unicode
property tables.  Example: Glyph `CJK full-width ideographic comma'
belongs to the `CJK character', `full-width', and `punctuation'
classes.  If necessary, the user should have a chance to overwrite
properties for a particular character or character class.  Note that
the amount of data can be enormous, so we should take care of an
efficient memory representation.

To avoid delays at startup of groff, I can imagine to make the default
classes even built-in -- groff then also needs requests to retrieve

  . the assigned properties of a single glyph
  . the list of characters in a given property class
  . a list of the available property classes

I haven't yet investigated which of the available Unicode classes
would be beneficial; maybe we can use a simpler solution or more
restricted for efficiency reasons.

> I suppose that, on the output side, we need to consider classes
> defined in different font files.

Yes, but here it means something completely different.  Perhaps we
should start using a different word, say, `ranges', to avoid
confusion.  Basically, `ranges' in font files are just notation
abbreviations.  An example is to assign the same glyph width to all
CJK characters of a font: I could do that glyph by glyph, but this
unnecessarily blows up the size of font files and makes groff run
slower.

> The one obvious flaw I see in the above is that East Asian
> line-breaking algorithms work the other way round from Western ones:
> lines may break anywhere unless prohibited.  Representing this
> efficiently using the current set of available cflags would mean a
> class for the whole CJK range with .cflags 70, and then a class for
> no-break-before characters with .cflags 68 and one for
> no-break-after characters with .cflags 66.  This may suggest that
> it's better to use the smallest enclosing class for flags as well as
> for other properties.

BTW, we need a new .cflags class to indicate that stretchable space
(with a default width of zero) should be inserted (automatically by
groff) between characters.  Otherwise we can't do proper
justification.

At least for Japanese typesetting, there are quite a lot of rules how
much horizontal space should be inserted between various input
character classes (for example, between a latin character and CJK
character).  I'm not sure whether we should invest time to actually
support these rules, given the target of extending groff to display
CJK manual pages.  Professional Japanese typesetting is definitely
beyond the scope of groff.


    Werner




reply via email to

[Prev in Thread] Current Thread [Next in Thread]