Re: [Chicken-users] string-translate and utf-8

chicken-users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-users] string-translate and utf-8

From:	Alex Shinn
Subject:	Re: [Chicken-users] string-translate and utf-8
Date:	Sat, 08 Nov 2008 11:10:43 +0900
User-agent:	Gnus/5.11 (Gnus v5.11) Emacs/22.3 (darwin)

Hi, sorry for the late reply.

Sunnan <address@hidden> writes:

> I'm updating old code that used to work:
>
> (require-extension syntax-case utf8 srfi-1 utf8-srfi-13 miscmacros)
>
> ;(import utf8)
> ;(import utf8-srfi-13) ;(commented out since they're not needed anymore?)
> (use utf8-srfi-13)  ;(I've tried with and without this line)
>
> (string-translate " i " "ö " "o_") ;; this should eval to "_i_"
>
>
> but i get Error: (vector-ref) out of range, I guess because it reads the
> multi-byte characters (i.e. #\ö) as multiple entries in the vector.

I can't reproduce this.  The utf8 extension (not
utf8-srfi-13) does provide a STRING-TRANSLATE replacement
which handles multi-byte characters (verified on a Chinese
example in the test suite).

The only thing I can think might be going wrong is the
normalization form.  If the ö you input into is not the
single Unicode character U+00F6 (LATIN SMALL LETTER O WITH
DIAERESIS), but is rather U+006F (LATIN SMALL LETTER O)
followed by U+0308 (COMBINING DIAERESIS), then you have not
only multi-byte characters, but multi-*codepoint*
characters.  STRING-TRANSLATE, as with all Unicode
utilities, works at the codepoint level, not the extended
grapheme level.  Thus the first vector has 3 elements to the
second vectors' 2 elements, and the range error occurs.

Modifying STRING-TRANSLATE to work at the extended grapheme
level rather than the codepoint level would be a lot of
work, and possibly not what people expect.

As a workaround, if you have no control over the
normalization forms, you can always use STRING-TRANSLATE*.

-- 
Alex

[Prev in Thread]

Current Thread

[Next in Thread]

[Chicken-users] string-translate and utf-8, Sunnan, 2008/11/02
- Re: [Chicken-users] string-translate and utf-8, felix winkelmann, 2008/11/07
- Re: [Chicken-users] string-translate and utf-8, Kon Lovett, 2008/11/07
- Re: [Chicken-users] string-translate and utf-8, Alex Shinn <=
  - Re: [Chicken-users] string-translate and utf-8, Sunnan *, 2008/11/08
    - Re: [Chicken-users] string-translate and utf-8, Alex Shinn, 2008/11/08
    - Re: [Chicken-users] string-translate and utf-8, Kon Lovett, 2008/11/27
    - Re: [Chicken-users] string-translate and utf-8, felix winkelmann, 2008/11/28

Prev by Date: Re: [Chicken-users] Srfi-19 problems
Next by Date: Re: [Chicken-users] string-translate and utf-8
Previous by thread: Re: [Chicken-users] string-translate and utf-8
Next by thread: Re: [Chicken-users] string-translate and utf-8
Index(es):
- Date
- Thread