guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: string-map arg order


From: Alex Shinn
Subject: Re: string-map arg order
Date: 05 Sep 2001 20:13:21 -0400
User-agent: Gnus/5.0808 (Gnus v5.8.8) Emacs/21.0.104

>>>>> "Dirk" == Dirk Herrmann <address@hidden> writes:

    Dirk> Resizing is an option if you make sure that no memory region
    Dirk> that is being used gets freed.  That's the reason why I
    Dirk> introduced the separation of a memory-region object and the
    Dirk> string objects that use it.  See
    Dirk> http://mail.gnu.org/pipermail/guile-devel/2000-November/000586.html

Hmmm.... this is a pretty big change in the way people will have to
handle strings.  We could deprecate SCM_STRING_CHARS and redefine it
as something like

#define SCM_STRING_CHARS(x) SCM_STRING_CONTENT_CHARS(SCMS_STRING_CONTENT(x))

but this wouldn't always work in threaded situations.  As your code
points out, every function that accepts a string will have to save the
content and use scm_remember_upto_here().  On the other hand, this is
only going to be a problem if people want to use both multi-byte
characters, string mutation, and threading in the same app.

Note that we can also use this same approach when resizing utf8
strings, so we can get back the O(n) time on string-for-each and kin.
Thanks for pointing that out :)

    Dirk> I don't understand this argument: As long as you stick to
    Dirk> ASCII, the fixed-width strings would also remain as they
    Dirk> ever were.

I should have clarified... as long as the source code and all
token/delimiter/special characters you refer to are plain ASCII, you
won't have problems even if a user of your program inputs utf8 data.
The same does not hold for utf{16,32} which can have ASCII values like
'/' and '\n' turn up in the high bytes.

Put more simply (and this has been pointed out in the earlier
discussion), utf8 is almost backwards compatible, and people could
start using the new Guile without having to switch to a multi-byte API
right away.

    Dirk> However, if the rest of the world actually uses utf8, then
    Dirk> this is an argument in favour of using it.  Still, I assume
    Dirk> major performance drawbacks.

I wouldn't consider "everyone else is doing it" a valid reason, but we
should take into consideration the performance drawbacks of the
conversions that would be required in addition to the inherent
performance drawbacks of variable-width characters.  Not an easy thing
to measure, and it might even be worth implementing both and comparing
them.  I'd like to get this right.

And since the company I worked for (a long time dying .com) finally
went under today, it looks like I'll have time to work on this :)

-- 
Alex Shinn <address@hidden>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]