guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Using libunistring for string comparisons et al


From: Ludovic Courtès
Subject: Re: Using libunistring for string comparisons et al
Date: Sun, 13 Mar 2011 22:30:48 +0100
User-agent: Gnus/5.110013 (No Gnus v0.13) Emacs/23.3 (gnu/linux)

Hi Mark,

Mark H Weaver <address@hidden> writes:

> Unfortunately, the alternatives are not pleasant.  We have a bunch of
> bugs in our string handling functions.  Currently, our case-insensitive
> string comparisons and case conversions are not correct for several
> languages including German, according to the R6RS among other things.
>
> We could easily fix these problems by using libunistring, which provides
> the operations we need, but only if we use a single string
> representation, and one that is supported by libunistring (UTF-8,
> UTF-16, or UTF-32).

I don’t think so.  For instance, you could “upgrade” narrow strings to
UTF-32 and then use libunistring on that.  That would fix case-folding
for “Straße”, I guess.

> So, our options appear to be:
>
>   * Use only wide strings internally.
>
>   * Reimplement several complex functions from libunistring within guile
>     (string comparisons and case conversions).
>
>   * Convert strings to a libunistring-supported representation, and
>     possibly back again, on each operation.  For example, this will be
>     needed when comparing two narrow strings, when comparing a narrow
>     string to a wide string, or when applying a case conversion to a
>     narrow string.
>
> Our use of two different internal string representations is another
> problem.  Right now, our string comparisons are painfully inefficient.

Inefficient in the (unlikely) case that you’re comparing a narrow and a
wide string of different lengths.

So yes, the current implementation has bugs, but I think most if not all
can be fixed with minimal changes.  Would you like to look into it
for 2.0.x?

Using UTF-8 internally has problems of its own, as Mike explained, which
is why it was rejected in the first place.

Thanks,
Ludo’.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]