Re: redoing SCM representation in 2.2

guile-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: redoing SCM representation in 2.2

From:	Ken Raeburn
Subject:	Re: redoing SCM representation in 2.2
Date:	Sun, 15 May 2011 05:00:07 -0400

On May 12, 2011, at 06:17, Andy Wingo wrote:
> I'm looking at new SCM representation and tagging possibilities in 2.2.
> Read the whole mail please, as it's a little complicated.

Innnnteresting....

> I would like to revisit the SCM representation and tagging scheme in
> 2.2.  In particular, I would like to look at NaN-boxing.  I explain the
> plan a bit below, but if you like to get your information depth-first,
> check out:

So... Guile 2.2 won't work on the VAXstation in my basement, which doesn't do 
IEEE math? :-(
(Not that I've powered it up in some time...)
Guess I hadn't thought about that before; we've got code that refers to IEEE 
floating point already, so does that mean we require IEEE floating point 
already?

On 64-bit SPARC and perhaps some other architectures, we'd be dependent on the 
OS only effectively using 48 bits worth of address space even if the hardware 
supports more.  I'd be surprised if we encounter a program that needs more 
storage than that, and I expect most current OSes will tend to have a couple of 
regions growing toward each other rather than scatter stuff all over the 
address space, but I could imagine a particularly naïve or aggressive form of 
address space layout randomization trying to take advantage of all 64 bits by 
scattering mapped memory throughout all 2**64 addresses (minus whatever the 
kernel uses), for libraries or heap allocation or both.

> Basically I think it's OK to restrict the Scheme heap to be within a
> 48-bit space, at least for the next decade or so.  But given that the
> total address space is more than 48 bits on many architectures,
> arbitrary immediate foreign pointers may not be possible on
> e.g. Sparc64.

Nor the full range of (u)int64_t values we might get from a library.

Though, I'll just throw these out there:

If the 64-bit SCM type isn't required to represent the full range of integer 
values the machine can support as immediate values, does it really have to 
encompass the full range of "double" values?  Is that really what we should be 
optimizing the encoding for?  Maybe for really large or really tiny values it 
would be okay to use heap storage as we do for bignums, and steal an exponent 
bit to use as a tag?  Or if you steal a few mantissa bits, you lose a little 
precision but keep all the exponent bits.  So you don't need to waste 13 bits 
on saying "this is not a floating point value" all the time, and you can widen 
the range permissible for immediate integer and pointer values.

How much range and precision do we need in floating point values, anyways?  Is 
there a reason to use "double" and not "float" or "long double"?  If "float" is 
acceptable (which I assume it's probably not; I'm just exploring the idea "out 
loud" as it were), we could just encode an intact "float" and a bunch of tag 
bits together in a 64-bit value, on any machine where "float" is 32 bits, and 
it'd probably have the range needed for a lot of everyday use.  Or, combine the 
ideas -- on a 32-bit machine, use a 32-bit type, one bit indicates "this is a 
'float' with one exponent bit stolen", otherwise more tag bits indicate other 
immediate or non-immediate types, and one of the non-immediate ones encodes a 
full "double" when the wider range is needed.

> I think we need to do the JSC way, as it appears to be the only way to
> work with the BDW GC, currently anyway.  We will need some integration
> with the GC to ensure the 48-bit space, but that should be doable.

Don't we have some objects now which can be initialized statically by the 
compiler, and for which the addresses get encoded directly into the resulting 
SCM objects?  That means the mapping of executable and library images would 
have to fit in the 48-bit address space, and that's generally up to the OS 
kernel; having BDW-GC do some magic at allocation time wouldn't be enough.

Ken

[Prev in Thread]

Current Thread

[Next in Thread]

redoing SCM representation in 2.2, Andy Wingo, 2011/05/12
- Re: redoing SCM representation in 2.2, nalaginrut, 2011/05/12
- Re: redoing SCM representation in 2.2, Stefan Israelsson Tampe, 2011/05/12
- Re: redoing SCM representation in 2.2, Mark H Weaver, 2011/05/13
  - Re: redoing SCM representation in 2.2, Andy Wingo, 2011/05/14
    - Re: redoing SCM representation in 2.2, Ken Raeburn, 2011/05/15
    - Re: redoing SCM representation in 2.2, Andy Wingo, 2011/05/15
- Re: redoing SCM representation in 2.2, Ken Raeburn <=
  - Re: redoing SCM representation in 2.2, Andy Wingo, 2011/05/15
    - Re: redoing SCM representation in 2.2, Ken Raeburn, 2011/05/15
    - Re: redoing SCM representation in 2.2, Andy Wingo, 2011/05/16
    - Re: redoing SCM representation in 2.2, Mark H Weaver, 2011/05/17
    - Re: redoing SCM representation in 2.2, Ken Raeburn, 2011/05/19

Prev by Date: Associate type information to tree-il expansions
Next by Date: Re: redoing SCM representation in 2.2
Previous by thread: Re: redoing SCM representation in 2.2
Next by thread: Re: redoing SCM representation in 2.2
Index(es):
- Date
- Thread