guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: The empty string and other empty strings


From: David Kastrup
Subject: Re: The empty string and other empty strings
Date: Fri, 13 Jan 2012 18:36:24 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.0.92 (gnu/linux)

Mark H Weaver <address@hidden> writes:

> David Kastrup <address@hidden> writes:
>
>> address@hidden (Ludovic Courtès) writes:
>>
>>> Hi Mark,
>>>
>>> Mark H Weaver <address@hidden> skribis:
>>>
>>>> What do other people think?
>>>
>>> As you said, R5RS makes it clear that there can be several (in the sense
>>> of eq?) empty strings, so I think what you did is the right thing.
>>
>> Since it uses the same verbiage with regard to '(), could you please
>> point out _where_ R5RS states that "freshly allocated" means "not
>> eq?"?
>
> Section 3.4 (Storage model) of the R5RS states:
>
>   Whenever this report speaks of storage being allocated for a variable
>   or object, what is meant is that an appropriate number of locations
>   are chosen from the set of locations that are not in use, and the
>   chosen locations are marked to indicate that they are now in use
>   before the variable or object is made to denote them.

And that's perfectly fine for the characters of a string.  However,
(string) has no characters.  Like (list) has no list members.  (list)
does not need _any_ allocation, and neither would (string).  For me it
makes sense to make the fundamental building block of a type a
self-contained value.  For multi-value non-composite types (like
numerical types) that is not necessarily feasible.  For composite types
with a single elementary non-composite value, it makes sense for me to
make this value a basic cell value.

Since empty strings are valid substrings of both mutable and non-mutable
strings, I don't see that it makes sense to apply either property to
them since it is impossible to change any character through them.  So
there are a number of operations which should for consistency's sake be
able to check for this special value efficiently.  Reserving a cell
value for it seems like the straightforward thing to do, and that is
what is done with lists also.

>> For me it means "does not contain any component in common with
>> previously allocated material".  The fixed constant '() or (list)
>> (the neutral element with regard to list concatenation) not
>> containing any allocated pairs meets that description, and the fixed
>> constant "" or (string) (the neutral element with regard to string
>> concatenation) not containing any allocated characters meets that
>> description.
>
> I think this is a very reasonable interpretation, but this is not in
> accordance with the standard.

Are you saying that (eq? (list) (list)) is not in accordance with the
standard since the standard specifies that a freshly allocated list is
to be returned?

>> So why treat them differently?  What does it buy us except trouble?
>
> I don't see how our current behavior buys us _any_ trouble.  We've
> voluntarily opted-out of a (marginal) optimization opportunity, and
> that's all.
>
> In your proposed behavior: in _almost_ all cases, `scm_from_stringn'
> (et al) would return an object that is not `eq?' to any other existing
> object.  However, in a single edge case, you'd have it return
> something that _is_ `eq?' to other existing objects.  This is the kind
> of behavior that could easily buy us trouble.

Why?  You can't change any other value _through_ it.  Do you want to use
(string) as a not-eq-to-anything sentinel like Lisp people do with (list
nil) sometimes?  It is known that (list) will not do for that purpose
(in spite of the standard saying that list will return a freshly
allocated list), so do you really think people will expect (string) to
do?

> To my mind, if the optimization is insignificant (and I suspect that
> it is), then it is safer to treat the edge cases the same as the
> common case, for the sake of simplifying the semantics.

You'll find yourself to be checking for "" more often in connection with
strings than for 0 in connection with numbers because "" is special in
that it contains no characters or other members.

So for me "" is a prime candidate for a single-cell constant.  We can
live with other objects like 0 not being eq to equal values, so we
certainly can with this one.

> However, my mind is not set in stone on this.  Does anyone else here
> agree with David?  Should we defend the legitimacy of this
> optimization, and ask the R7RS working group to include explicit
> language specifying that empty strings/vectors need not be freshly
> allocated?

They don't specify that empty lists need not be freshly allocated,
either, so it would be strange to make a difference here.

I think it makes more sense to define "freshly allocated" instead, as
"no pre-existing object can be modified through any operation on it".
That means that any single-cell constant is by definition "freshly
allocated".  And indeed, its _cell_ is freshly allocated even though
that cell _value_ may be eq? to that of other cells.

-- 
David Kastrup




reply via email to

[Prev in Thread] Current Thread [Next in Thread]