groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Groff] Re: non-ASCII chars and grohtml


From: Gaius Mulley
Subject: [Groff] Re: non-ASCII chars and grohtml
Date: 24 Nov 2004 12:06:41 +0000
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2

Werner LEMBERG <address@hidden> writes:

> Gaius,
> 
> 
> if I say
> 
>   \X'html:ü'
> 
> I get
> 
>   x X html:ü
> 
> in the intermediate troff output file.  With other words, the \X
> escape passes `ü' unmodified.  This is a problem, since grohtml
> expects ASCII input only.  We have no possibility in GNU troff to
> convert `ü' to `\[:u]' in the `mouth' (to use TeX's terminology), so I
> suggest that you add a warning to grohtml, something like this:
> 
>   Charset `US-ASCII' doesn't contain character code 0xFC (`ü')

ok

> Additionally, we need a new tag `html:charset' which sets the
> `charset' attribute in the <meta> command.  Then a string
> `.input-encoding' (the leading dot shall indicate that this string is
> meant as read-only) should be added to the latinX.tmac files which can
> be used in www.tmac to set the tag automatically:
> 
>   .tag "html:charset \*[.input-encoding]
> 
> The whole issue is a bit tricky; for example, I suggest to allow at
> most one call to `.tag html:charset...' for simplicity.  Another
> problem is how to determine the valid character ranges -- shall this
> be built into grohtml?  Or shall my proposed html:charset tag look
> like this:
> 
>   html:charset <name> <start1> <end1> <start2> <end2> ...
> 
> so that grohtml can be dumb, and the latinX.tmac define the proper
> ranges via \*[.input-encoding]?

this is certainly a good idea. Grohtml would still have to check the
ranges of legal characters though, but this is easy - just not quite
as easy as testing for ch < 0x80  :-)

> Of course, the simplest solution is to disallow characters >= 0x80
> completely in the `html:...' tag, but a user may wonder why she can
> use `ü' everywhere in the document except in .URL and friends (and
> switching to UTF8 in the future needs additional changes).

yes I think your html:charset method outlined above is the way to
go..

Gaius




reply via email to

[Prev in Thread] Current Thread [Next in Thread]