groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] grops and Unicode, Unicode in general.


From: Werner LEMBERG
Subject: Re: [Groff] grops and Unicode, Unicode in general.
Date: Sat, 21 May 2005 08:28:23 +0200 (CEST)

> Wartan guesses correctly that the best way would be to start from
> the original URW++ fonts and add the Cyrillic glyphs to them, but
> with some caveats:

Do you mean starting again?  Uh, oh, this is a *large* project, and
normally it will stop unfinished...  Cf. the `freefont' project which
has no contributions since two years.  I don't say that the Cyrillic
extension to URW are bad in general, I say that they aren't stable
yet.

> 1. The addition *must* be done by hand to guarantee that the
> outline, hint and subroutine programs are not damaged by a graphical
> font editor (believe me, *all* graphical editing be it with a high
> end tool (Fontlab, Asia Font Studio, Ikarus or DTL FontMaster), or
> with a half-baked tool such as Fontforge *will* destroy the original
> digital font program) and each iteration will have lower quality.

I don't believe you. :-) First, PS fonts don't have outline, hint, or
subroutine programs.  You are apparently thinking of TrueType fonts.
Second, PS hints can be autogenerated by tools like FontForge, but I'm
sure that *all* editors allow preservation and manual alteration of
hints.

> 2. The fonts must have different names.

I disagree.  Fonts must have updated version numbers, together with
proper ChangeLogs.  This is one of the fields where the people working
on `improving' the URW fonts have done big mistakes in the past.

> 3. You don't need to have all glyphs in single font container!!!

Hmmm.  There are a big number of glyphs which are shared by, say, the
Latin and Cyrillic script: numbers, accents, punctuation marks, etc.
Those glyphs must be available in both a Cyrillic and a Latin font --
while creating virtual fontsets is easy, virtual kerning for them is
not, nor is it supported in X, AFAIK.  With other words, having a
single glyph container for Latin and Cyrillic makes sense.

> X is very happy to use virtual fontsets, so you can place your
> Cyrillic glyphs in physically different containers that appear as
> one to the graphics display engine; in fact, that's how CID fonts
> work, sort of.

Indeed, sort of!  CIDs have been invented to do exactly the opposite
of what you are suggesting, namely to have big glyph containers
(mainly for CJK fonts).  For non-CJK fonts, non-CID-keyed CFFs are
*much* better since you have proper glyph names.

For the readers who are not familiar with the many acronyms: CFF
(Compact File Format) is the new default font format from Adobe.  The
glyphs itself are in Type1 format (but with a compressed
representation), either with glyph names or with CIDs.

A CID (Character Identifier) is a number, representing a glyph from a
repository.  It is mainly used for CJK fonts where glyph names are not
meaningful.  Adobe has defined various repositories -- which are
extended if needed -- to cover a script with most typographical
features.  Since PS fonts are organized as a collection of various
dictionaries, the term `CID-keyed' is used: The key is the CID, and
the value is the glyph data.

Here some citations from the Adobe Technical note 5094 (in file
5094.CJK_CID.pdf).  This is a bit lengthy, but it shows how such glyph
repositories evolve and how large they are.

  4 The Adobe-Japan1 Character Collection

    The purpose of this Japanese character collection is to provide
    support for the JIS X 0201-1997, JIS X 0208:1997, JIS X 0212-1990,
    and JIS X 0213:2004 character set standards, and select corporate
    variations thereof.  Supported encodings include ISO-2022-JP,
    EUC-JP, Shift-JIS, UCS-2, UTF-8, UTF-16, and UTF-32.  [...]

  4.1 Adobe-Japan1-0

    Supplement 0 (zero) of Adobe-Japan1 enumerates 8,284 glyphs (CIDs
    0 through 8283), and provides support for the JIS X 0208-1983
    (JIS83), JIS C 6226-1978 (JIS78), and JIS X 0201-1997 character
    set standards.  Apple®, NEC®, and Fujitsu® corporate character
    sets are also supported.  CIDs 1–230 are proportional-width Latin
    glyphs.  CIDs 231–325, 390, 501–503, and 599–632 are half-width
    Latin glyphs.  [...]

  4.2 Adobe-Japan1-1

    Supplement 1 of Adobe-Japan1 adds 75 glyphs (CIDs 8284 through
    8358), and provides support for the JIS X 0208-1990 (JIS90)
    character set standard.  [...]

  4.3 Adobe-Japan1-2

    Supplement 2 of Adobe-Japan1 adds 361 glyphs (CIDs 8359 through
    8719), and provides support for the Microsoft® Windows® 3.1J
    character set.  [...]

  4.4 Adobe-Japan1-3

    This supplement was designed to add pre-rotated instances of all
    proportional- and half-width glyphs found in earlier supplements.
    Their purpose is to significantly improve the handling of vertical
    glyphs in the context of OpenType.

    Supplement 3 of Adobe-Japan1 adds 634 glyphs (CIDs 8720 through
    9353, all pre-rotated).  CIDs 8720 through 8949 provide additional
    proportional Latin glyphs (230), CIDs 8950 through 9083 additional
    half-width Latin glyphs (134), CIDs 9084 through 9262 additional
    half-width hiragana and katakana (179), CIDs 9263 through 9275
    additional half-width symbols (13), and CIDs 9276 through 9353
    additional line-drawing glyphs (78).  [...]

  4.5 Adobe-Japan1-4

    Supplement 4 of Adobe-Japan1 adds 6,090 glyphs (CIDs 9354 through
    15443).  The purpose of this character collection supplement is to
    provide professional and commercial publishers with most of the
    glyphs that they require.  This includes a complete set of
    proportional-width italic Latin glyphs, annotated glyphs,
    additional punctuation and symbols, third- and quarter-width
    numerals and punctuation, horizontal- and vertically-optimized
    kana glyphs, ruby glyphs, and kanji.  The kanji include additional
    JIS78 (JIS C 6226-1978) variants, JIS83 (JIS X 0208-1983)
    variants, traditional forms, JIS X 0221-1995 Ideographic
    Supplement 1 (918 kanji), K-JIS, and other kanji variants.  [...]

  4.6 Adobe-Japan1-5

    Supplement 5 of Adobe-Japan1 adds 4,873 glyphs (CIDs 15444 through
    20316).  The purpose of this character collection is to support
    the new JIS X 0213:2004 standard (originally published as JIS X
    0213:2000), and to be compatible with the Mac OS X Version 10.2
    fonts.  [...]

  4.7 Adobe-Japan1-6

    Supplement 6 of Adobe-Japan1 adds 2,741 glyphs (CIDs 20317 through
    23057).  The purpose of this character collection is to complete
    the support for the JIS X 0212-1990 standard (thus deprecating the
    Adobe-Japan2-0 character collection), and to support Kyodo News’
    U-PRESS character set.

The mapping from and to a glyph repertoire is done by CMaps (not to be
confused with `cmap' tables in OpenType fonts).  Adobe provides more
than 100 such mappings; an example is UniJIS-UTF32-H which maps
Japanese glyphs for horizontal layout from Unicode in UTF32
representation to the Adobe-Japan1-X collection.

The big advantage of such glyph collections is that they are
documented and stable.  But they are not suitable for non-CJK scripts.

> And you can recode them on-the-fly so that the glyphs are delivered
> as Unicode characters to the client applications.

Neither kerning nor other higher-level font features like OpenType
support will work.  Virtual fontsets are only useful to combine
scripts which don't overlap like, say, Thai and Latin.  But Latin and
Cyrillic (and Greek) *do* overlap!

> Therefore you could use the original URW++ fonts untouched and add
> the Cyrillic glyphs from different physical sources.

Now we are coming back from discussing the pro and cons of virtual
fontsets and glyph containers to the reality how to use Cyrillic fonts
in groff.  Except the URW extensions for Cyrillic I'm not aware of
other freely available fonts which match the appearance of the
standard PS fonts.  Do you know something else?

> You could lose the ability to search eventual PDF output but only if
> you don't take care to recode the data stream to some Unicode
> encoding recognized by the PDF spec before creating such output.

As long as you use AGL compliant glyph names, searching in PDF files
work without any problems.  BTW, CID-keyed fonts always need a proper
ToUnicode mapping, either implicitly by using a standard Adobe glyph
collection (see above) or by explicitly supplying a ToUnicode CMap.


    Werner

reply via email to

[Prev in Thread] Current Thread [Next in Thread]