groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Groff] Re: What's missing for Unicode support of groff?


From: Werner LEMBERG
Subject: [Groff] Re: What's missing for Unicode support of groff?
Date: Sun, 11 Dec 2005 08:05:59 +0100 (CET)

> I have failed to find a place in troff code to plug utf decoder in.
> Thus let us do in the UNIX way - since groff already lives off a
> pipeline, one more preprocessor for dealing with encodings will not
> hurt.  [...]

What you suggest is *exactly* the same as what I propose.

> What the patch does: If groff has a -k command line option, a
> converter called gconv is called.

This isn't necessary IMHO.  I plan to make it always run except we
have `-K none' (to use your command line option names).

> If "-K arg" is present, the "arg" is passed to "gconv".

Well, if we have a single command line option, shall it be `-k' or
`-K'?  I favor `-k' since it's easier to type.  Opinions?

> A sample gconv script is included: UTF8 is converted to \[uXXXX], if
> an optional argument is present - text is iconv'ed to UTF first.

Yes!

> If the sample gconv is too simple, gpreconv may be used instead of
> iconv, or merged with uni2groff to make gcov a binary.

I'll work with gpreconv, looking into your and Bernd Haible's solution
to make a water-proof UTF-8 -> groff entity conversion.

> Test case at http://www.iaas.msu.ru/tmp/encodinstest.tgz

Thanks, will look at it soon.

> I have compiled groff-current on MacOS X 10.4.3 and stepped over the
> following bug: my makeinfo reports its version this way:
> 
>   makeinfo (GNU texinfo) 4.8

My texinfo reports the same.

> configure does not parse the version string correctly and complains
> that version is too old (instead of admitting the error).
> 
> The attached patch solved the problem for me.

This is interesting, since it looks like a problem with sed.  You are
using BSD sed, right?  Please show us what the following expression
yields:

  makeinfo --version | sed 's/^.* \([^ ]\+\)$/\1/;1q'

with GNU sed, I get `4.8'.  Then I have

  echo 4.8 | sed 's/^\([0-9]*\).*$/\1/'

    ==> 4

  echo 4.8 | sed 's/^[^.]\+\(.*\)$/\1/'

    ==> .8

  echo .8 | sed 's/\.\([0-9]*\).*$/\1/'

    ==> 8

The various sed expressions are there to handle various possible
version info strings like `4', `4.8', or `4.8.1'.

Before applying your patch I want to know the reason for the failure,
probably reporting it to both the BSD sed and autoconf people.  Maybe
my original regexps are non-POSIX...

> 3 There is an OS called Darwin
> 
> It would be nice to include Darwin in the list of known OS'es:

Which Darwin versions are defined?  Is it really simply `Darwin' and
nothing else?

> PS. Where do the enriched URW fonts live?

Here is the desastrous report from Nelson Beebe:

  http://ghostscript.com/pipermail/gs-devel/2004-October/003102.html

The homepage appears to be

  http://unix.freshmeat.net/projects/urw-fonts-cyrillic/

but there isn't any recent information.  Unfortunately, I wasn't able
to find any information regarding a version of the free URW fonts with
Cyrillic extensions which fixes those problems :-( Maybe you have more
luck.


    Werner




reply via email to

[Prev in Thread] Current Thread [Next in Thread]