Re: Another alternative string representation proposal 1.3

guile-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Another alternative string representation proposal 1.3

From:	Dirk Herrmann
Subject:	Re: Another alternative string representation proposal 1.3
Date:	Mon, 9 Oct 2000 11:22:53 +0200 (MEST)

On 6 Oct 2000, Keisuke Nishida wrote:

> >  * I generalized the idea of a char-field to a memory-field.  The idea is,
> >    that in principle its not only strings and symbols that may share
> >    memory regions, but one could also think of sharable uniform arrays
> >    etc.  Even if there is currently little use for it, it's something that
> >    does not seem too far out of scope.
> 
> I think this is a good generalization.  But why don't you call it
> `memory-region'?  Isn't it more ordinary?

If people prefer memory-region I will use that name.

> > A memory-field has the following attributes:
> >  * base is a void* pointing to the base address of the memory-field.
> >  * length is a scm_sizet denoting the size of the memory-field.
> >  * owner_p is a boolean value that indicates whether the memory-field is
> >    actually the owner of the memory-region, i. e. whether the memory region
> >    should be freed if the memory-field is garbage collected.
> > 
> > Possible double cell layout:
> >    <memory-field type tag/owner_p, length, base, <type-dependent-data>>
> 
> You always want to include read-only information in attributes, don't you?

Yes, you are right.  That should be added to the list of common
attributes.

> If type-dependent-data can be a Scheme object, you need to add a bit for
> that.  [...]

Could be done this way, but I have a different usage pattern in mind than
you do:  Objects of type memory-field can belong to different client types.  
For a certain client type it is _known_ whether type-dependent-data is a
scheme object or not.  For example, a  memory-field that belongs to a
string will never have a scheme object as type-dependent-data.  You can
think of the type-dependent-data entry as if belonging to the client
type's layout, as if the client type had an additional cell entry
available (which, however, is shared among all those objects which share
the same memory region).

Thus, if the type-dependent-data is a scheme object, the client type is
responsible for marking that object.  The same is true if the region
belonging to the memory-field contains scheme objects:  Only the client
type knows whether this is the case.  If, for example, you would implement
scheme vectors by using memory-fields, the mark function of the scheme
vector object would have to obtain the base-address from its memory-field
and then perform the marking.

But, this has the consequence that scheme objects in the memory-field's
memory region as well as in the type-dependent-data are not gc-protected
before the memory-field _and_ the client object are initialized.

If this is too confusing, we should distinguish between different kinds of
memory fields:  Those that point to a field of scheme objects, and those
that don't.  However, we would lose some flexibility, like for example
memory fields with mixed contents.

> [...]  But I think allowing type-dependent-data to be customizable is over
> generalization; it creates a confusion.  I think a reference count is enough.
> Instead, you could allow more sharing patterns, like "shared mutable", by
> some encoding:
> 
>   -2  shared read-only
>   -1  shared mutable
>    0  no user
>    1  one user, mutable
>  >=2  several users, immutable
> 
> This allows shared mutable substrings or subarrays. (Could be useful..)

Well, I agree that shared memory regions without copy-on-write policy may
be useful.  However, I prefer the idea of client-type dependent
interpretation of type-dependent-data over putting too much things into
the type-dependent-data field.

We can create confusion in two different ways (-:
* My preferred way of confusion:  Make type-dependent-data dependent of
  the client type.  Thus, for memory-fields with copy on write policy,
  there is no need for a shared-mutable attribute.  In other words, you
  never have to check for that situation as long as you are dealing with
  strings or symbols, for example.  And, if you'd like to have some type
  with shared mutable memory (like guile's current shared substrings), for
  this type the type-dependent-data field would use a different encoding.
  This, however, means that you would not be able to share memory-fields
  between ordinary strings/symbols and client types with a shared-mutable
  policy.  But, this is a good thing IMO.  In short:  Only client types
  that use the same interpretation of the type-dependent-data field may
  share the same memory-field objects among each other.
* Your preferred way of confusion:  Use a fixed encoding of the
  type-dependent-data field among _all_ client types of memory-fields.
  This encoding, however, has to be a superset of all possible uses that
  client types may have for the type-dependent-data field.  But, strings
  and symbols with the copy-on-write policy will (hopefully) never point
  to a memory-field with the shared-mutable flag set.  Thus, when dealing
  with strings/symbols you will either perform redundant checks (to make
  sure that the shared-mutable flag is _really_ not set), or you will in
  fact also use a client type specific interpretation of the
  type-dependent-data field, by omitting checks for attributes that are
  known not to be used for the current client type.

> > Strings in guile are represented by shared copy-on-write memory-fields.
> > Thus, a string object in guile is defined by the following attributes:
> >  * a memory-field object
> >  * an unsigned integer denoting the offset from the memory-field's base
> >    address where the characters of the string start
> >  * an unsigned integer denoting the string's length.
> 
> Why don't you use the pointer to the start address rather than the offset?

You're right, this is much better.  It will safe two memory accesses and
an add operation every time the string's base address is requested.

Thanks a lot,
Dirk

[Prev in Thread]

Current Thread

[Next in Thread]

Another alternative string representation proposal 1.3, Dirk Herrmann, 2000/10/06
- Re: Another alternative string representation proposal 1.3, Keisuke Nishida, 2000/10/06
  - Re: Another alternative string representation proposal 1.3, Dirk Herrmann <=

Prev by Date: Re: local-eval and bytecompilation
Next by Date: Re: CVS Guile and configure, Makefile.in, and other auxiliary files...
Previous by thread: Re: Another alternative string representation proposal 1.3
Next by thread: Making new ports.
Index(es):
- Date
- Thread