groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] Re: groff: radical re-implementation


From: Tomohiro KUBOTA
Subject: Re: [Groff] Re: groff: radical re-implementation
Date: Mon, 23 Oct 2000 11:15:22 +0900
User-agent: Wanderlust/1.1.1 (Purple Rain) EMY/1.13.8 (Tastes differ) FLIM/1.13.2 (Kasanui) APEL/10.2 Emacs/20.7 (i386-debian-linux-gnu) MULE/4.1 (AOI)

Hi,

At Sat, 21 Oct 2000 10:46:51 +0200 (CEST),
Werner LEMBERG <address@hidden> wrote:

> In general.  I want to define terms completely independent on any
> particular program.  We have
> 
>   character set
>   character encoding
>   glyph set
>   glyph encoding

I understand.  Since we are discussing on the preprocessor, let's 
concentrate on character, not glyph.  I think you now will agree to
specify the 'character set/encoding' by a single word such as
'EUC-JP' instead of a pair of 'JIS-X-0208' and 'EUC'.

BTW, I am implementing the preprocessor.  Now it has features of:
 - input from standard input (stdin)
 - output to standard output (stdout)
 - I18N directive to support locale-sensible mode
 - hard-coded converter from Latin1, EBCDIC, and UTF-8 to UTF-8
 - locale-sensible converter from any encodings supported by OS to UTF-8
   (note: UTF-8 has to be supported by iconv(3) )
 - encoding for input is determined by command option or default
 - default is 'latin1' when compiled without I18N or locale-sensible when
   compiled with I18N
However I have to implement
 - encoding has to be determined also by '-*- ... -*-' directive in
   the roff source
 - (I18N mode) encoding has to be able to be specified by MIME-style
   and Emacs-style names.
 - efficiency of memory and CPU usage is not considered yet.
 - input from files besides stdin

I will send the source soon.


> >    roff source in any encoding like '\(co'     (character)
> >           |
> >           |  preprocessor
> >           V
> >    UTF-8 stream like u+00a9                    (character)
> >           |
> >           |  troff
> >           V
> >    glyph expression like 'co'                  (glyph)
> >           |
> >           |  troff (continuing)
> >           V
> 
> Here is missing a step:
> 
>      typeset output                              (glyph)
>             |
>             |  grotty
>             V
> 
> >    UTF-8 stream like u+00a9 or '(C)'           (character)
> >           |
> >           |  postprocessor
> >           V
> >    formatted text in any encoding              (character)


I understand well.  Thank you for your explanation.
BTW, besides TTY output, HTML will need postprocess from glyph to 
character like 'grotty' in tty mode, since HTML is a text file.
I think the encoding for HTML can be always UTF-8.  We can add a
line between <HEAD> and </HEAD>

<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8">

(I found a code in grohtml.cc to write this line without charset
directive.)

---
Tomohiro KUBOTA <address@hidden>
http://surfchem0.riken.go.jp/~kubota/

reply via email to

[Prev in Thread] Current Thread [Next in Thread]