groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] Collating sequence for sorting words, etc.


From: Werner LEMBERG
Subject: Re: [Groff] Collating sequence for sorting words, etc.
Date: Tue, 21 Aug 2001 15:39:33 +0200 (CEST)

> A query which has just come up on comp.software.international
> reminds me that there is an issue which I have been meaning
> to raise for discussion on groff. It concerns the sorting
> of words into "alphabetical order" for index-entries,
> lists of references, etc.
>
> [...]
>
> There is no such provision in groff. I first hit this years ago when
> noticing that "refer" had its own notion of collation, with no
> provision for changing this. In any case, I subsequently changed to
> using 'makeindex' and 'mkbib', which are more flexible anyway and
> allow the sort of thing I describe above.

AFAIK, groff can't be used to sort anything, so the discussion reduces
to refer, I think (please correct me if I'm wrong).  BTW, it would be
nice if you could write a short introduction how to use makeindex and
mkbib with groff...

> [...]
>
> Such a "collation file" would need to cope with characters
> specified by groff escape sequences, so that you could
> have something like
> 
>    A a
>    B b
>    C c
>    Ç ç \(C, \(c, \[C-cedilla] \[c-cedilla]

I suggest the following:

  . By default, refer should rely on the collation sequence provided
    by the C library (i.e, relying on LC_COLLATE).  Since groff
    provides a lot of symbols which can't be covered by a single 8bit
    character set, this feature makes only sense if implemented with
    Unicode.

  . Read a collation file.  Again, I think it best to support Unicode
    only, providing a clean separation between glyph and input
    character names (the latter basically don't exist in groff except
    for the cases where the glyph name is identical to the input
    character name).  The format must be slightly extended for refer
    (<token> = <char> | '<special char>' | "<string>"):

      <uppercase token> <lowercase token>
      <token> /<category>/
      alias <token1> <token2> ...

    Example:

      A a

      # thorn: \[TP], \[Tp]
      'TP' 'Tp'

      # thorn strings as used in ms: \*[Th], \*[th]
      alias 'TP' "Th"
      alias 'Tp' "th"

      # special characters
      - /hyphen/
      alias - 'hy'

      'en' /range-separator/


> The only way to cope with this sort of thing is to use proper
> software designed for the job. One example is the "makeindex"
> program that is part of the "TeX" text-formatting suite; this has
> all the flexibility required. And it can be used with groff.

Have you ever tried `xindy' which is much more powerful than
makeindex?


    Werner

reply via email to

[Prev in Thread] Current Thread [Next in Thread]