groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] man file character encoding.


From: Erwin Waterlander
Subject: Re: [Groff] man file character encoding.
Date: Fri, 27 Sep 2013 15:00:45 +0200
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130801 Thunderbird/17.0.8

Op 27-9-2013 11:30, Colin Watson schreef:
On Thu, Sep 26, 2013 at 09:58:04PM +0200, Erwin Waterlander wrote:
I'm curious to know how man-db determines the encoding of the man
page. I cannot find that information. Would you like to explain how
man-db does the encoding detecion?
Certainly.  man-db contains a table of the typical legacy encodings for
each of a number of known languages (I'm happy to add to those, but
since new translation efforts tend to start with UTF-8 these days, it's
a closed set and I haven't had to extend it since 2008 when I synced up
with Fedora).  There is generally only one of these.  UTF-8 is a strict
enough encoding that for reasonable volumes of text it is usually
possible to distinguish automatically between it and a legacy encoding,
simply by trying to decode as UTF-8 and falling back to the legacy
encoding if that fails.  manconv does this job; it is more or less like
iconv except that it can take a priority order of possible input
encodings.

There are cases where this system fails, and in such cases you can store
manual pages in directories with an explicit encoding tag attached (e.g.
"/usr/share/man/man1/<ll>_<CC>.<encoding>"), or put an explicit
Emacs-style coding tag at the top of the file.  In practice this is
rarely necessary.

Hi Colin,

Thanks for the information.

The reason I work with Federico's man is that I often work on Cygwin
when I don't have Linux at hand. Cygwin does not have man-db
available. Soon I get a Russian translation of my program
(dos2unix), that made this problem actual again for me. Three years
ago I saw this problem coming. At that time I tested also on Fedora
12, which was still using Federico's man. I didn't notice that
Fedora changed to man-db in the meantime.
Ah, yes.  I corresponded at one point with somebody who might be
interested in porting man-db to Cygwin, but it never came to anything.
I would be ecstatic if somebody could help with such a port, as I don't
use Windows myself.

I have experience with packaging for Cygwin. I package libunistring and dos2unix. I was thinking of packaging man-db for Cygwin, because Cygwin has man pages in a mixture of encodings. There is not a central build server, like Fedora has, that could transcode all man pages to UTF-8 or add coding tags. Packages are all build by individuals in different ways and then uploaded. Then a man system that can handle different encodings out of the box sounds ideal.

I use Gnulib extensively, which deals with a lot of portability
problems, but not everything.  The main effort will be in porting
libpipeline to deal with Windows-style process creation and supervision;
after that I expect that it will just be a matter of various minor fixes
for Unix-specific assumptions I've made.  You don't have to come with a
complete patch; I'd be willing to accept incremental changes that make
the job easier for the next person, or even "this general pattern of
things you're doing is Unix-specific; you need to use this pattern
instead to be portable to Cygwin".


Yesterday I did a quick compile of libpipeline on Cygwin. It compiled out of the box, but I did not run any test. I will continue with this and see how far I get. My knowledge of Windows internals is limited, and so is my time, so it may take some time. If we are lucky the Cygwin POSIX layer solves most of the problems.

best regards,

--
Erwin Waterlander
http://waterlan.home.xs4all.nl/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]