[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Chicken-users] string-translate and utf-8
From: |
Alex Shinn |
Subject: |
Re: [Chicken-users] string-translate and utf-8 |
Date: |
Sat, 08 Nov 2008 11:10:43 +0900 |
User-agent: |
Gnus/5.11 (Gnus v5.11) Emacs/22.3 (darwin) |
Hi, sorry for the late reply.
Sunnan <address@hidden> writes:
> I'm updating old code that used to work:
>
> (require-extension syntax-case utf8 srfi-1 utf8-srfi-13 miscmacros)
>
> ;(import utf8)
> ;(import utf8-srfi-13) ;(commented out since they're not needed anymore?)
> (use utf8-srfi-13) ;(I've tried with and without this line)
>
> (string-translate " i " "ö " "o_") ;; this should eval to "_i_"
>
>
> but i get Error: (vector-ref) out of range, I guess because it reads the
> multi-byte characters (i.e. #\ö) as multiple entries in the vector.
I can't reproduce this. The utf8 extension (not
utf8-srfi-13) does provide a STRING-TRANSLATE replacement
which handles multi-byte characters (verified on a Chinese
example in the test suite).
The only thing I can think might be going wrong is the
normalization form. If the ö you input into is not the
single Unicode character U+00F6 (LATIN SMALL LETTER O WITH
DIAERESIS), but is rather U+006F (LATIN SMALL LETTER O)
followed by U+0308 (COMBINING DIAERESIS), then you have not
only multi-byte characters, but multi-*codepoint*
characters. STRING-TRANSLATE, as with all Unicode
utilities, works at the codepoint level, not the extended
grapheme level. Thus the first vector has 3 elements to the
second vectors' 2 elements, and the range error occurs.
Modifying STRING-TRANSLATE to work at the extended grapheme
level rather than the codepoint level would be a lot of
work, and possibly not what people expect.
As a workaround, if you have no control over the
normalization forms, you can always use STRING-TRANSLATE*.
--
Alex