guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: string-map arg order


From: Alex Shinn
Subject: Re: string-map arg order
Date: 03 Sep 2001 18:55:24 -0400
User-agent: Gnus/5.0808 (Gnus v5.8.8) Emacs/21.0.104

>>>>> "Dirk" == Dirk Herrmann <address@hidden> writes:

Ah, I was asking if I should keep hacking or if we are back to
discussion... looks like the debate is revived :)

    Dirk> The proposal is nice as long as you only have one thread.
    Dirk> In such a context, Jim's idea about the scm_mb_cache type
    Dirk> are nice.  However, when it comes to multithreading, you
    Dirk> cannot guarantee that your cache is valid, except you use a
    Dirk> mutex for the whole lifetime of a scm_mb_cache.

The same applies to the idea of multiple fixed-width string
types... you may have to upgrade a 1-byte wide string to a 2 or 4 byte
wide string.  The only way you can avoid this is by making everything
utf-32, but then all strings and symbols would use 4 times as much
space.  Another huge disadvantage of utf-32 is that almost nothing
else uses it, so you'd have to continuously convert strings passed to
or returned from external library functions.  All C wrappers would
have to change their current interface to use special accessors to
Guile strings, which would be terribly complicated if you want to
write a mutator.  utf-8 does not avoid all these problems (some
current applications may be using latin-1 chars), and as you point out
may have performance concerns, but since one of the main goals of
Guile is strong interoperability with C, utf-8 seems like the only
viable solution.

    Dirk> One of the arguments for a variable width encoding has been,
    Dirk> that mutations are rare and that these copy operations
    Dirk> (which are necessary if you replace a character with one
    Dirk> that has a longer encoding) would therefore be seldom.
    Dirk> However, please note that still for accessor functions like
    Dirk> string-ref and substring you are working with integer
    Dirk> indices.  In a multi thread environment, the mapping between
    Dirk> character index and memory address of that character may
    Dirk> change at any moment.  Even if such mutations are rare, you
    Dirk> don't know _when_ they actually happen in a parallel thread.
    Dirk> That is, _every_ call to string-ref has to recompute the
    Dirk> actual character position.  The same with the character
    Dirk> indices given to substring.

There's still some room for optimizations.  If you're not running
multi-threaded, or the string is not shared, then there's no problem.
Even if you are, you can set a dirty bit when a string is mutated.
Alternately, you make shared memory strings copy-on-write into
non-shared memory (unless they've been set up with explicit mutex).
Or make string mutexes implicit.

IMO, we should not be optimizing for string mutations, and should in
fact try to offload as much of the work involved in our solution into
the mutators.  One of the observations in the previous discussion was
that the current string primitives encourage treating strings as
random access, and we would be better off with higher-level primitives
that treated strings in a more opaque, functional manner.  Since then
we've gotten srfi-13 with procedures such as string-map and
string-fold which do just that, and could be modified to handle
caching and threading issues implicitly.

    Dirk> IMO, given a variable-width encoding, effective string
    Dirk> handling in a multi-threaded environment is impossible on
    Dirk> the scheme level.  A simple loop like

    Dirk> (do (i 0 (+ i 1)))
    Dirk>     ((= i (length s)))
    Dirk>   (if (eqv? (string-ref s i) #\a)
    Dirk>       (display "foo\n")))

    Dirk> would work in O(n*n) time complexity where n is the length
    Dirk> of s.

True.  And since implicit caching at the string level is probably
impractical, this would be expected to be slow regardless of
threading.  But in a variable-width encoding scheme, it would simply
be a given that random access for strings is slow, and people should
avoid string-ref.  It would have been much cleaner to have written

(string-for-each (lambda (c) (if (eqv? c #\a) (display "foo\n"))) s)

to begin with.

-- 
Alex Shinn <address@hidden>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]