[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Unicode and Guile
From: |
Stephen Compall |
Subject: |
Re: Unicode and Guile |
Date: |
25 Oct 2003 18:08:45 +0100 |
User-agent: |
Gnus/5.09 (Gnus v5.9.0) Emacs/21.2 |
Andy Wingo <address@hidden> writes:
> If there is no plan, may I suggest that we move our internal
> representation of strings to UTF-8. There's an interesting
> introductory article written on www.joelonsoftware.com, although I
> don't have the link ATM. This has the advantage that ASCII
> characters up to 127 are represented the same.
I think this may be a disadvantage. As you say, UTF-8 strings are
still not ASCII-compatible, but that casting their data blocks to
char* still works for ASCII strings, people might be tempted to simply
do that, because other languages "don't matter enough to bother with
it".
> Of course, above that characters might take up to eight bytes, which
> means that all code that processes user-input strings has to be
> changed. Painful, eh? But if we hope to write apps that deal with
> all languages of the world, that's the only way.
>
> So, reactions on that would be appreciated.
As a result, UCS-4 strings have the advantage of breaking code that
tries to merely interpret the data block as char*. UCS-4 is what
wchar_t is in glibc. I'd debate the virtues of treating all code
points equally, versus their status in UTF-8, but I'm sure that's
better done (and has been done) in another forum. UCS-2 shouldn't
even be considered an option, and UTF-16 seems to offer the worst of
both worlds.
As for the semantics, I submit the way Emacs does it: node (elisp)Text
Representations, or
http://www.gnu.org/manual/elisp-manual-21-2.8/html_node/elisp_542.html
--
Stephen Compall or s11 or sirian
I think your opinions are reasonable, except for the one about my mental
instability.
-- Psychology Professor, Farifield University
Etacs Becker quarter Albright csim Delta Force defense information
warfare Perl-RSA CDC condor undercover SAFE analyzer ASPIC USCODE