guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: string-map arg order


From: Dirk Herrmann
Subject: Re: string-map arg order
Date: Wed, 5 Sep 2001 22:10:54 +0200 (MEST)

On 4 Sep 2001, Alex Shinn wrote:

> >>>>> "Dirk" == Dirk Herrmann <address@hidden> writes:
> 
>     Dirk> Further, you would not start by making everything utf-32.
>     Dirk> Rather, you would start with a 1-byte width and only
>     Dirk> increase width as necessary, which is at most 2 times:
>     Dirk> 1->2->4.  With a variable width encoding, you may have to
>     Dirk> increase the size (n * (m-1)) times, n being the string
>     Dirk> length, m being the maximum character width.  Further,
> 
> In the context of multi-threading, I'm not sure resizing is even an
> option.  For whatever API we choose, ultimately external C library
> functions will be given a pointer to the characters (char or wchar) of
> a string.  If we reallocate the string from another thread, that
> pointer will then be invalid.

Resizing is an option if you make sure that no memory region that is being
used gets freed.  That's the reason why I introduced the separation of a
memory-region object and the string objects that use it.  See
http://mail.gnu.org/pipermail/guile-devel/2000-November/000586.html

> An alternative implementation is to always allocate 4 bytes per
> character (with *either* fixed- or variable-byte), and expand in place
> as needed.  Why work with single-byte strings in 4x the space?  So
> that you don't have to convert when passing to C functions.  What
> steered me away most from fixed-width encodings is coming up with a
> decent API.  The rest of the world (other languages, GTK, FreeType,
> Linux itself) are moving to utf8 - if we choose another encoding,
> we'll have to convert data types back and forth constantly.  And the
> possibility of different string types or wide strings means all
> current extensions would have to update to the new API right away -
> with utf8 they're safe so long as they stick to ASCII, and could
> upgrade at their leisure.

I don't understand this argument:  As long as you stick to ASCII, the
fixed-width strings would also remain as they ever were.

However, if the rest of the world actually uses utf8, then this is an
argument in favour of using it.  Still, I assume major performance
drawbacks.

Best regards
Dirk Herrmann





reply via email to

[Prev in Thread] Current Thread [Next in Thread]