Re: The Road to 2.2

On Fri, May 17, 2013 at 5:49 PM, Andy Wingo <address@hidden> wrote:

Friends! Schemers! Gentle Guilefolk! The time has come to begin in
earnest on the road to Guile 2.2!

I know all of you are wondering, "did that paragraph really need four
exclamation marks?" Well the answer is yes, yes it did, and the rest of
this mail lays out the reasons.

So, what does 2.2 mean, why is it appropriate now? 2.2 means many
things on many levels, but on its most basic level, it will result in a
new series of Guile deployments. This series will be installed in
parallel with Guile 2.0 [0]. It shouldn't disrupt people that develop
or distribute C extensions to Guile or programs written in Guile. True,
the binary named "guile" on your system might be 2.0 or 2.2, depending
on what you have installed, though one can get around that with passing
--program-suffix to ./configure. In total, the disruptive impact of a
new series is low.

[0] http://www.gnu.org/software/guile/manual/html_node/Parallel-Installations.html

On the other hand, creating a new development series does have a cost in
splitting our development effort away from the stable branch, and so we
have to justify that. This is the second part of the question, "why
now?" The answer is that we have important improvements to make to the
Guile implementation that can't be made in a strictly binary-compatible
way. We think these things are so exciting as to be worth the
inconvenience of a new stable series. We might still make a few more
2.0 maintenance releases, but when I describe what we have on tap for
2.2 you will see why we're excited to move to the next phase.

RTL VM
======

Guile 2.0 added a compiler and a virtual machine to what was previously
a pure-interpreter implementation of Scheme. I know we all are pretty
stoked about that, but some time has passed and it's time to look at
what we have and see how we can do better.

The Guile 2.0 VM is a stack machine. That means that its instructions
usually take their values from the stack, and produce values (if
appropriate) by pushing values onto the stack.

The problem with stack machines is that they penalize named values. If
I realize that a computation is happening twice and I factor it out to a
variable, that means in practice that I allocate a stack frame slot to
the value. So far so good. However, to use the value, I have to emit
an instruction to fetch the value for use by some other instruction; and
to store it, I likewise have to have another instruction to do that.

For example, in a stack machine, compiling "z = x + y" might mean:

(local-ref 0) ; x
(local-ref 1) ; y
(add)
(local-set! 2) ; z

The cost of interpreting a bytecode is largely dispatch cost, which is
linear in the number of instructions executed. So factoring out a
computation to a local variable can sometimes actually cause code to run
slower, depending on the relative cost of the computation versus
accessing its operands. Pushing and popping values can also be
expensive as we are updating a stack pointer all the time, possibly
checking for overflow as well.

The solution to this problem is to use a "register machine". I use
scare quotes because in fact this is a virtual machine, so unlike a CPU,
the number of "registers" is unlimited, and in fact they are just stack
slots accessed by index.

So in a register machine, "z = x + y" might be:

(add 2 0 1) ; register 2 = register 0 + register 1;

Of course the code for a register VM has to be bigger: instead of _byte_
code that doesn't need to encode its operands because they are expected
on the stack, register VM code has to encode the "register names", so it
ends up being better to implement it as _word_ code. But this cost is
much less significant than the dispatch cost.

Anyway, that's the back-story. For Guile 2.2, I have implemented a new
register-based virtual machine. For no good reason I called it the
"RTL" VM, for "register transfer language". RTL is a name for a
particular kind of intermediate representation in compilers, but perhaps
it isn't the most appropriate denomination for what we're doing. In any
case the name seems to have stuck. Perhaps we need a marketing guru
here to find some other compelling name :)

Given that we're changing the bytecode...
=========================================

So RTL is a new bytecode, a new VM, and will ultimately need a new
compiler. I'll talk a bit more about the compiler in a minute, but I
want to mention another aspect of RTL.

Besides being faster than the Guile 2.0 VM, RTL opens up other
interesting possibilities. One of the most exciting is the ability to
statically allocate all kinds of data. In Guile 2.0, when a .go file
loads from disk, it allocates all of the constants that it needs on the
heap. But that's sub-optimal in many ways. For example, a pair of
immediates like '(1 . 2) doesn't actually need to be allocated at all --
it can just be part of the loaded image. Likewise a pair with
non-immediates like ("foo" . "bar") can also be allocated in the image,
but it needs fixing up at runtime so that its fields point to the "foo"
and "bar" values. In this particular case the "foo" and "bar" would be
values also in the image.

What I mean to say is that we can, at compile time, allocate and lay out
the memory for the constants that a program will need, compute the
minimal set of initializations that those constants will need, and make
the right links from the compiled code to the constants. This is
exactly like what the C compiler, linker, and dynamic loader do -- so we
take pointers from them and do exactly the same thing. The RTL branch
writes out compiled data to ELF files. When a .go file is loaded, it
runs a thunk to do the run-time initializations, then returns the "main"
thunk for the .go (which the caller can run or not).

Note that we use ELF for all platforms. Since we don't rely on the
platform's linker or dynamic loader, there's no sense in using the
format of each individual platform. That said, the files that Guile
produces are readable with readelf, and it should be possible to operate
on them with standard ELF tools.

Guile's linker and loader are careful to separate read-only data like
bytecode and writable data like those pairs that need runtime
initialization. In this way we can maximize sharing between processes.
Also, Guile puts a bunch of data that's not needed in normal execution
at the end of the file, in the read-only section -- like docstrings, for
example. If a program never asks for procedure-documentation of a
procedure, those docstrings will never be paged in by the kernel; but if
a program does do a lot of documentation grovelling, they are readily
accessible.

Separating the different metadata into different sections also makes it
possible to strip metadata. For example when deploying to systems with
not much disk space, you might strip docstrings or line number
information. You can use standard ELF tools to do this, or do it using
Guile's ELF tools.

Finally, using a well-structured format like ELF brings in the
possibility of linking together a number of modules in one file, and
possibly even an entire application. We don't have this working yet,
and it's not in the immediate plans, but it is a great possibility to
keep in mind.

What about the compiler?
========================

So we have a new VM, assembler, disassembler, and all the things! It's
pretty close to finished: it's just missing the part that lets a
debugger know about line numbers and local variables. But how do we
actually produce RTL assembly from Scheme?

Well, there we don't have a finished story. Noah Lavine made a branch
to compile tree-il, our high-level intermediate language, to a
"continuation-passing-style" (CPS) form. This is a useful
transformation for many reasons, but among them it is convenient for
giving a "name" to all values -- a property that a register VM prizes.

His CPS compiler isn't done yet, though when I last looked at it, it was
looking great. So the path goes Scheme -> Tree-IL (something we already
have), then -> CPS (something Noah did), then -> RTL Assembly (again,
Noah), then -> bytecode (the piece I did). It doesn't yet work with all
of Guile but it's getting there.

How to get there?
=================

However, I think we've done all we can in branches. I think we should
bless this RTL experiment as the way to do Guile 2.2. (Thoughts or
objections welcome.) To that end, I think we need to start merging
wip-rtl into master.

What I propose is that, if we agree, I start merging in trivial stuff.
When I get to something interesting that's not just a refactoring of
existing things, I'll submit that patch to the list. We have a few days
to look at it, we go through some review, and then we figure out how to
move forward -- usually merging some version of that patch. Then we
repeat.

Once the RTL branch is all merged in, we can start doing the same with
Noah's wip-rtl-cps branch.

Then eventually some glorious day comes and we start using the CPS/RTL
toolchain, everything is working great and fast, and we start deleting
the old code.

What do you all think?

I love it when a plan comes together!

Cheers,

Andy
--
http://wingolog.org/

From:	Noah Lavine
Subject:	Re: The Road to 2.2
Date:	Sat, 18 May 2013 09:44:08 -0400