[Groff] Re: coding tags and utf-16

groff

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Groff] Re: coding tags and utf-16

From:	Werner LEMBERG
Subject:	[Groff] Re: coding tags and utf-16
Date:	Wed, 04 Jan 2006 15:58:21 +0100 (CET)

> > There is a serious problem with coding tags and utf-16 encodings
> > of any flavour: Emacs simply can't recognize the tag.  This is a
> > non-trivial problem.
> 
> Sorry for the late reply, but I think coding tag is useless for a
> file encoded in some of utf-16 variants.
> 
> If a file has BOM at the head, BOM should tell the exact encoding
> whatever is specified in coding tag.
> 
> If a file is encoded without BOM, we must use the less reliable
> heuristics to guess utf-16be or utf-16le.  If you find a coding-tag
> spec by ignoring all zero bytes at even byte indexes, it means that
> the file is, in high possibility, utf-16be whatever the tag value
> is.  If you find a coding-tag spec by ignoring all zero bytes at odd
> byte indexes, it means that the file is utf-16le whatever the tag
> value is.
> 
> So, in any cases, a tag value itself is useless.  [...]

I'll do the following for groff's preprocessor, preconv:

  . If the data starts with a BOM, use it, and ignore the coding tag.

  . Otherwise, if there are zero bytes in the first two lines, ignore
    those zero values, emit a warning, and use the coding tag, if any.

  . Otherwise, use the default encoding -- this normally will lead to
    a wrong result and make groff explode, but I consider this better
    than to apply heuristics, especially if you have to recognize both
    UTF16 and UTF32 variants.  This is probably a suboptimal solution
    but quite easy to implement, and the user can always explicitly
    select an encoding on the command line.  Perhaps someone finds
    (and implements) a better way which I can then adapt to preconv.


      Werner

[Prev in Thread]

Current Thread

[Next in Thread]

[Groff] Re: coding tags and utf-16, Werner LEMBERG <=

Prev by Date: [Groff] Re: preconv
Next by Date: [Groff] folding preconv into soelim?
Previous by thread: [Groff] Re: preconv autoconfigury
Next by thread: [Groff] folding preconv into soelim?
Index(es):
- Date
- Thread