guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: O(1) accessors for UTF-8 backed strings


From: Mark H Weaver
Subject: Re: O(1) accessors for UTF-8 backed strings
Date: Tue, 15 Mar 2011 11:46:07 -0400
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.3 (gnu/linux)

Alex Shinn <address@hidden> wrote:
> On Sun, Mar 13, 2011 at 1:05 PM, Mark H Weaver <address@hidden> wrote:
>> I just realized that it is possible to implement O(1) accessors for
>> UTF-8 backed strings.
>
> It's possible with several approaches, but not necessarily worth it:
>
> http://trac.sacrideo.us/wg/wiki/StringRepresentations

Alex, can you please clarify your position?  I fear that readers of your
message might assume that you are against my proposal to store strings
internally in UTF-8.  Having read the text that you referenced above, I
suspect that you are in favor of using UTF-8 with O(n) string accessors.

For those who may not be familiar with the special properties of UTF-8,
please read at least the section on "Common Algorithms and Usage
Patterns" near the end of the text Alex referenced.  In summary, many
operations on UTF-8 such as substring searches, regexp searches, and
parsing can be done one byte at a time, using the same inner loop that
would be used for ASCII or Latin-1.  Also, although it is not mentioned
there, even simple string comparisons (done lexigraphically by code
point) can be done byte-wise on UTF-8.

I'd also like to point out that the R6RS is the only relevant standard
that mandates O(1) string accessors.  The R5RS did not require this, and
WG1 for the R7RS has already voted against this requirement.

  http://trac.sacrideo.us/wg/ticket/27

I'll write more on this later.

    Mark



reply via email to

[Prev in Thread] Current Thread [Next in Thread]