[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: UTF8 string storage and retrieval in XML
From: |
Mark H Weaver |
Subject: |
Re: UTF8 string storage and retrieval in XML |
Date: |
Mon, 01 Feb 2016 13:16:38 -0500 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux) |
Richard Shann <address@hidden> writes:
> Can anyone explain what is going on when you try to store strings with
> non-ASCII characters? Here is an example:
>
> guile> (define another-data "Čć")
> guile> another-data
> "�\x8c�\x87"
> guile> (display another-data)
> Čćguile>
I guess this is Guile 1.x, where strings are merely byte sequences.
Your terminal is using the UTF-8 encoding, where "Čć" is represented as
the byte sequence:
0xC4 0x8C 0xC4 0x87
When printing this using 'write' (which is how values are printed at the
REPL), Guile 1.x is treating this byte sequence as Latin-1. The 0xC4 is
the Latin-1 representation for the character "Ä", but 0x8C and 0x87 are
not characters in Latin-1 and so are escaped as "\x8c" and "\x87".
When printing using display, Guile is simply writing the bytes out
unescaped, which your terminal interprets as UTF-8.
Obviously this is terrible, which is why Guile 2.0+ strings are
sequences of unicode code points. Can you switch to Guile 2.0?
Mark