[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: string-map arg order
From: |
Dirk Herrmann |
Subject: |
Re: string-map arg order |
Date: |
Mon, 3 Sep 2001 23:17:13 +0200 (MEST) |
On 3 Sep 2001, Alex Shinn wrote:
> >>>>> "Dirk" == Dirk Herrmann <address@hidden> writes:
>
> Dirk> However, I fully agree with you about the problems of
> Dirk> multiple-width encodings: In case of multiple threads, every
> Dirk> access to one of a string's characters needs to recompute
> Dirk> the memory location of that character, because some other
> Dirk> thread might have changed the string and even replaced some
> Dirk> characters of different encoding widths.
>
> That's funny, I've reversed my initial opinions of variable-width
> encodings, and in fact just a few minutes ago got a utf-8 based
> version of Guile to pass the test-suite :)
>
> After taking Thi's advice and looking through the archives, I found a
> discussion from last November. This referenced a proposal Jim Blandy
> made in August 1999 (no archives on this), still available in
> doc/mbapi.texi. The proposal was utf-8 based, and there ensued a lot
> of arguing about the efficiency of variable-width encodings, but no
> conclusions and eventually the thread died out. But the proposal
> seemed more than reasonable, and after thinking about it also seems to
> be the only realistic option if we want to be compatible with existing
> Guile projects/modules as well as continue to provide easy integration
> with C libraries.
The proposal is nice as long as you only have one thread. In such a
context, Jim's idea about the scm_mb_cache type are nice. However, when
it comes to multithreading, you cannot guarantee that your cache is valid,
except you use a mutex for the whole lifetime of a scm_mb_cache.
One of the arguments for a variable width encoding has been, that
mutations are rare and that these copy operations (which are necessary if
you replace a character with one that has a longer encoding) would
therefore be seldom. However, please note that still for accessor
functions like string-ref and substring you are working with integer
indices. In a multi thread environment, the mapping between character
index and memory address of that character may change at any moment. Even
if such mutations are rare, you don't know _when_ they actually happen in
a parallel thread. That is, _every_ call to string-ref has to recompute
the actual character position. The same with the character indices given
to substring.
IMO, given a variable-width encoding, effective string handling in a
multi-threaded environment is impossible on the scheme level. A simple
loop like
(do (i 0 (+ i 1)))
((= i (length s)))
(if (eqv? (string-ref s i) #\a)
(display "foo\n")))
would work in O(n*n) time complexity where n is the length of s.
Best regards
Dirk Herrmann
- Re: string-map arg order, Dirk Herrmann, 2001/09/03
- Re: string-map arg order, Alex Shinn, 2001/09/03
- Re: string-map arg order,
Dirk Herrmann <=
- Re: string-map arg order, Alex Shinn, 2001/09/03
- Re: string-map arg order, Dirk Herrmann, 2001/09/04
- Re: string-map arg order, Alex Shinn, 2001/09/04
- Re: string-map arg order, Dirk Herrmann, 2001/09/05
- Re: string-map arg order, Alex Shinn, 2001/09/05
- Re: string-map arg order, Dirk Herrmann, 2001/09/06
- Re: string-map arg order, Alex Shinn, 2001/09/06
- Re: string-map arg order, Dirk Herrmann, 2001/09/06
- Re: string-map arg order, Alex Shinn, 2001/09/06
- Re: string-map arg order, Dirk Herrmann, 2001/09/06