[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: new command @U{nnnn}?
From: |
Karl Berry |
Subject: |
Re: new command @U{nnnn}? |
Date: |
Thu, 4 Dec 2014 00:16:42 GMT |
sure that all the entities we output are consistent with this encoding.
According to
http://www.w3.org/TR/html4/charset.html#h-5.3.1
it seems that &#d; and &#xH; always refer to Unicode code points,
regardless of the declared charset. Anyway. They may or may not be
displayable, but that's a different question.
I did implement @U in TeX, for what it's worth (not documented yet).
For makeinfo, to summarize our discussion:
HTML and XML and Docbook can output the string "&#xNNNN;", no need to
worry about anything else. I see nothing to be gained by using actual
binary characters, even regardless of ENABLE_ENCODING_USE_ENTITY.
Although, in the one case if @documentencoding is UTF-8 and
ENABLE_ENCODING_USE_ENTITY is false, I suppose it would be fine to
output the literal character.
For Info/plaintext:
- If @documentencoding is not set, or is set to US-ASCII, output 7-bit
ASCII: the literal six-char string U, +, N, N, N, N) if NNNN >7f.
- If @documentencoding is set to UTF-8, output the actual binary
character, "\xNNNN" in Perl.
- if @documentencoding is set to anything else (e.g., Latin 1), well, if
you can easily guess at a transliteration, then that would be ideal, but
no need to go to extraordinary lengths. Lacking a transliteration, the
only safe thing to do is output the string for ASCII, as far as I can
see. We must not output literal binary UTF-8 in an 8-bit encoding,
since that is likely to be invalid input.
Wdyt?
Thanks,
Karl
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: new command @U{nnnn}?,
Karl Berry <=