groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Groff] non-ASCII chars and grohtml


From: Werner LEMBERG
Subject: [Groff] non-ASCII chars and grohtml
Date: Wed, 24 Nov 2004 09:47:42 +0100 (CET)

Gaius,


if I say

  \X'html:ü'

I get

  x X html:ü

in the intermediate troff output file.  With other words, the \X
escape passes `ü' unmodified.  This is a problem, since grohtml
expects ASCII input only.  We have no possibility in GNU troff to
convert `ü' to `\[:u]' in the `mouth' (to use TeX's terminology), so I
suggest that you add a warning to grohtml, something like this:

  Charset `US-ASCII' doesn't contain character code 0xFC (`ü')

Additionally, we need a new tag `html:charset' which sets the
`charset' attribute in the <meta> command.  Then a string
`.input-encoding' (the leading dot shall indicate that this string is
meant as read-only) should be added to the latinX.tmac files which can
be used in www.tmac to set the tag automatically:

  .tag "html:charset \*[.input-encoding]

The whole issue is a bit tricky; for example, I suggest to allow at
most one call to `.tag html:charset...' for simplicity.  Another
problem is how to determine the valid character ranges -- shall this
be built into grohtml?  Or shall my proposed html:charset tag look
like this:

  html:charset <name> <start1> <end1> <start2> <end2> ...

so that grohtml can be dumb, and the latinX.tmac define the proper
ranges via \*[.input-encoding]?

Of course, the simplest solution is to disallow characters >= 0x80
completely in the `html:...' tag, but a user may wonder why she can
use `ü' everywhere in the document except in .URL and friends (and
switching to UTF8 in the future needs additional changes).


    Werner




reply via email to

[Prev in Thread] Current Thread [Next in Thread]