[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: redoing SCM representation in 2.2
From: |
Ken Raeburn |
Subject: |
Re: redoing SCM representation in 2.2 |
Date: |
Sun, 15 May 2011 05:00:07 -0400 |
On May 12, 2011, at 06:17, Andy Wingo wrote:
> I'm looking at new SCM representation and tagging possibilities in 2.2.
> Read the whole mail please, as it's a little complicated.
Innnnteresting....
> I would like to revisit the SCM representation and tagging scheme in
> 2.2. In particular, I would like to look at NaN-boxing. I explain the
> plan a bit below, but if you like to get your information depth-first,
> check out:
So... Guile 2.2 won't work on the VAXstation in my basement, which doesn't do
IEEE math? :-(
(Not that I've powered it up in some time...)
Guess I hadn't thought about that before; we've got code that refers to IEEE
floating point already, so does that mean we require IEEE floating point
already?
On 64-bit SPARC and perhaps some other architectures, we'd be dependent on the
OS only effectively using 48 bits worth of address space even if the hardware
supports more. I'd be surprised if we encounter a program that needs more
storage than that, and I expect most current OSes will tend to have a couple of
regions growing toward each other rather than scatter stuff all over the
address space, but I could imagine a particularly naïve or aggressive form of
address space layout randomization trying to take advantage of all 64 bits by
scattering mapped memory throughout all 2**64 addresses (minus whatever the
kernel uses), for libraries or heap allocation or both.
> Basically I think it's OK to restrict the Scheme heap to be within a
> 48-bit space, at least for the next decade or so. But given that the
> total address space is more than 48 bits on many architectures,
> arbitrary immediate foreign pointers may not be possible on
> e.g. Sparc64.
Nor the full range of (u)int64_t values we might get from a library.
Though, I'll just throw these out there:
If the 64-bit SCM type isn't required to represent the full range of integer
values the machine can support as immediate values, does it really have to
encompass the full range of "double" values? Is that really what we should be
optimizing the encoding for? Maybe for really large or really tiny values it
would be okay to use heap storage as we do for bignums, and steal an exponent
bit to use as a tag? Or if you steal a few mantissa bits, you lose a little
precision but keep all the exponent bits. So you don't need to waste 13 bits
on saying "this is not a floating point value" all the time, and you can widen
the range permissible for immediate integer and pointer values.
How much range and precision do we need in floating point values, anyways? Is
there a reason to use "double" and not "float" or "long double"? If "float" is
acceptable (which I assume it's probably not; I'm just exploring the idea "out
loud" as it were), we could just encode an intact "float" and a bunch of tag
bits together in a 64-bit value, on any machine where "float" is 32 bits, and
it'd probably have the range needed for a lot of everyday use. Or, combine the
ideas -- on a 32-bit machine, use a 32-bit type, one bit indicates "this is a
'float' with one exponent bit stolen", otherwise more tag bits indicate other
immediate or non-immediate types, and one of the non-immediate ones encodes a
full "double" when the wider range is needed.
> I think we need to do the JSC way, as it appears to be the only way to
> work with the BDW GC, currently anyway. We will need some integration
> with the GC to ensure the 48-bit space, but that should be doable.
Don't we have some objects now which can be initialized statically by the
compiler, and for which the addresses get encoded directly into the resulting
SCM objects? That means the mapping of executable and library images would
have to fit in the 48-bit address space, and that's generally up to the OS
kernel; having BDW-GC do some magic at allocation time wouldn't be enough.
Ken