guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: string-map arg order


From: Dirk Herrmann
Subject: Re: string-map arg order
Date: Mon, 3 Sep 2001 23:17:13 +0200 (MEST)

On 3 Sep 2001, Alex Shinn wrote:

> >>>>> "Dirk" == Dirk Herrmann <address@hidden> writes:
> 
>     Dirk> However, I fully agree with you about the problems of
>     Dirk> multiple-width encodings: In case of multiple threads, every
>     Dirk> access to one of a string's characters needs to recompute
>     Dirk> the memory location of that character, because some other
>     Dirk> thread might have changed the string and even replaced some
>     Dirk> characters of different encoding widths.
> 
> That's funny, I've reversed my initial opinions of variable-width
> encodings, and in fact just a few minutes ago got a utf-8 based
> version of Guile to pass the test-suite :)
> 
> After taking Thi's advice and looking through the archives, I found a
> discussion from last November.  This referenced a proposal Jim Blandy
> made in August 1999 (no archives on this), still available in
> doc/mbapi.texi.  The proposal was utf-8 based, and there ensued a lot
> of arguing about the efficiency of variable-width encodings, but no
> conclusions and eventually the thread died out.  But the proposal
> seemed more than reasonable, and after thinking about it also seems to
> be the only realistic option if we want to be compatible with existing
> Guile projects/modules as well as continue to provide easy integration
> with C libraries.

The proposal is nice as long as you only have one thread.  In such a
context, Jim's idea about the scm_mb_cache type are nice.  However, when
it comes to multithreading, you cannot guarantee that your cache is valid,
except you use a mutex for the whole lifetime of a scm_mb_cache.

One of the arguments for a variable width encoding has been, that
mutations are rare and that these copy operations (which are necessary if 
you replace a character with one that has a longer encoding) would
therefore be seldom.  However, please note that still for accessor
functions like string-ref and substring you are working with integer
indices.  In a multi thread environment, the mapping between character
index and memory address of that character may change at any moment.  Even
if such mutations are rare, you don't know _when_ they actually happen in
a parallel thread.  That is, _every_ call to string-ref has to recompute
the actual character position.  The same with the character indices given
to substring.

IMO, given a variable-width encoding, effective string handling in a
multi-threaded environment is impossible on the scheme level.  A simple
loop like

(do (i 0 (+ i 1)))
    ((= i (length s)))
  (if (eqv? (string-ref s i) #\a)
      (display "foo\n")))

would work in O(n*n) time complexity where n is the length of s.

Best regards
Dirk Herrmann




reply via email to

[Prev in Thread] Current Thread [Next in Thread]