guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

strings rationale


From: Tom Lord
Subject: strings rationale
Date: Mon, 6 Aug 2001 01:32:37 -0700 (PDT)

Here is the rationale (written long ago) for the string lattice
in Systas:




WARNING: This section is not quite up to date. 

The Systas Scheme string lattice attempts to balance competing
requirements concerning the representation of strings, and
opportunities for serendipitous generalization of procedures that
operate on strings.

In their internal representation, Systas Scheme symbols and strings
are quite similar. This is not a coincidence. The differences between
symbols and strings are more in how they are traditionally used than
in what they are. Symbols must be constructed using a hash-table
lookup or some similar device to enforce the needed equality (eq? )
between same-named symbols. Strings on the other hand are each
constructed individually. Strings can be modified, while symbol names
are fixed for the lifetime of the object. Aside from those
differences, the objects are quite similar. The introduction of a
read-only string type simply formalizes the similarities and gives
non-mutating string primitives a suitable abstraction for operating on
both symbols and strings.

But what about the distinction between writable strings and other
kinds of strings? The object created by list->string is an
scm_is_basic_string and an scm_is_string ; the object created by
make-shared-substring, if the string argument is a writable string, is
an scm_is_string , but not an scm_is_basic_string .

The introduction of writable strings, the distinction between
substrings and ordinary strings, is a concession to C.

Systas Scheme is designed to operate smoothly with C programs. For
that reason, the internal representation of ordinary strings is as an
array of characters, terminated by a final `'\0'' character. The final
null is not counted in the string-length of the string, but is there
in case the string is passed to a C function that relies on its
presence. Shared substrings can not guarantee the presense of the null
character. Many string functions are able to operate on strings that
are missing that null, but not all. The name string is given to
mutable string-like objects that include a final null character and
symbol names immutable string-like, 0-terminated objects. The name
writable string is given to mutable string-like objects that may or
may not include a final null (that may or may not be
substrings). read-only string is the name of immutable (actually not
necessarily mutable ) string-like objects that may not be
0-terminated.

Most functions accept substrings, but some functions operate by
building 0-terminated strings from substrings they are passed as
arguments. If passed a string or substring, that step is skipped.

The shared substring type was added to the language to simplify
programs that otherwise have to pass around lots of string indexes as
parameters.


------ end of rationale ------


In retrospect, if there is a mistake in that design, it is the
"concession to C".  Built-ins that operate on mutable strings
should not rely at all on 0-termination, and should be robust
when 0 occurs in the middle of a string.

But the idea of a read-only-string is safe, reasonable, useful,
and otherwise good.

-t




reply via email to

[Prev in Thread] Current Thread [Next in Thread]