Re: new command @U{nnnn}?

texinfo-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: new command @U{nnnn}?

From:	Karl Berry
Subject:	Re: new command @U{nnnn}?
Date:	Thu, 4 Dec 2014 00:16:42 GMT

    sure that all the entities we output are consistent with this encoding. 

According to
http://www.w3.org/TR/html4/charset.html#h-5.3.1
it seems that &#d; and &#xH; always refer to Unicode code points,
regardless of the declared charset.  Anyway.  They may or may not be
displayable, but that's a different question.

I did implement @U in TeX, for what it's worth (not documented yet).

For makeinfo, to summarize our discussion:

HTML and XML and Docbook can output the string "&#xNNNN;", no need to
worry about anything else.  I see nothing to be gained by using actual
binary characters, even regardless of ENABLE_ENCODING_USE_ENTITY.
Although, in the one case if @documentencoding is UTF-8 and
ENABLE_ENCODING_USE_ENTITY is false, I suppose it would be fine to
output the literal character.

For Info/plaintext:

- If @documentencoding is not set, or is set to US-ASCII, output 7-bit
ASCII: the literal six-char string U, +, N, N, N, N) if NNNN >7f.

- If @documentencoding is set to UTF-8, output the actual binary
character, "\xNNNN" in Perl.
  
- if @documentencoding is set to anything else (e.g., Latin 1), well, if
you can easily guess at a transliteration, then that would be ideal, but
no need to go to extraordinary lengths.  Lacking a transliteration, the
only safe thing to do is output the string for ASCII, as far as I can
see.  We must not output literal binary UTF-8 in an 8-bit encoding,
since that is likely to be invalid input.

Wdyt?

Thanks,
Karl

[Prev in Thread]

Current Thread

[Next in Thread]

Re: new command @U{nnnn}?, Karl Berry <=

Prev by Date: Re: texindex in awk
Next by Date: update info format specification
Previous by thread: texindex in awk
Next by thread: update info format specification
Index(es):
- Date
- Thread