[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
a new run-time for Scheme implementations
From: |
Tom Lord |
Subject: |
a new run-time for Scheme implementations |
Date: |
Thu, 26 Jul 2001 17:50:55 -0700 (PDT) |
[[[
The project proposed below might be a good opportunity to
build abstractions around the taggins system, for example.
-t
]]]
I would like to guage whether a particular project I have in mind is
perceived as useful to other projects, whether there are hackers who
would like to work on the project, and whether we might find
financial support for the project.
The project I have in mind is to build a general purpose C library,
one that could either replace or augment most of libc, depending on
how it is used, and that includes a run-time system for high-level
languages such as Scheme, Java, and various functional languages.
Some of the high-level requirements that I have in mind are summarized
here, and spelled out in detail at the end of this message:
* Scheme-like Types, Cleanly Available to C Programs
* Java-like Types, Cleanly Available to C Programs
* Thread Support
* Unicode Support
* Fancy Text Support
* A Replacement for Stdio
* An Exception Mechanism
* Robust Garbage Collection
* Developed With Continuous Testing and Documentation
While building such a run-time System, we could simultaneously work on
tightly integrated implementations of Scheme, Java and similar
languages. That would help to validate the design of the run-time
system. (In case it isn't obvious, I have a strong hunch that there
is a tight, practical, clean language design that has, as subsets,
Scheme, a subset of Java (or some other statically typed imperative language
with GC-based allocation), and some functional languages, both eager
and lazy.)
So: who's interested in such a library; who's interested in
contributing time, money, or other resoures; and why are you
interested? Please reply with your answers. If there is sufficient
interest, I'll set up a mailing list.
Here are my answers:
WHAT I'M INTERESTED IN DOING
I'm particularly interested in working on these subsystems:
I/O
low-level Unicode support
strings, including editable, attributed text
the representations of Scheme-like and Java-like data types
the interfaces and interactions between various subsystems
at least one of the GC implementations
a Scheme implementation built on top of the library
I'm interested in coordinating the project and gatekeeping patches.
If I can be paid for it, I'm interested in working on this nearly full
time, at least until the run-time system is reasonably "done" and
"stable".
I'd be happy to act as the "design czar" for the project, with the
goals: (1) decide in a coherent way the large number of arbitrary
questions that are likely to arise; (2) decide the large number of
objective issues on the basis of consesus among the best qualified
contributors and customers.
As gatekeeper, I'd insist that new components include thorough
development tests (that run when you type "make test") and good
documentation before they're checked in to the primary development
branch.
I prefer that sets of related changes be made submitted as complete
patch sets, so they can be more easily reviewed and otherwise
manipulated.
I have a server that could be used to host the project.
I have a C library that I think would make a good starting point for
the project.
I have considerable experience implementing Scheme, and a small amount
of experience with Java implementations. I have a reading knowledge
of implementation techniques for functional languages. I am quite
good with many of the relevant data structures. I have a reading
knowledge of GC techniques and considerable experience with one (alas,
conservative) GC implementation. I have exquisite taste in C coding
style.
WHY I WANT TO DO THIS
Well, ego gratification, of course. But also:
I've wanted a run-time system like this for several years. Primarily,
I want to use it as a foundation component for a Scheme-based
application framework. Additionally, I'm quite sick of the
limitations of libc.
In the past, I've tried to make progress on this thing in the context
of other projects. I've found that approach to be hopelessly
inefficient
I think that several Free Software language implementations are in
trouble, in part because they depend on a broken GC. In the name of
helping to further the success of Free Software, I'd like to help fix
these problems.
This project is an essential part of what I most want to hack on, so
I'd like to get paid for it.
Regards,
-t
* Scheme-like Types, Cleanly Available to C Programs
The run-time system should include Scheme-like data types,
making these available to C, independent of any complete
Scheme implementation. For example, the library would
provide garbage collected cons pairs (lisp lists), vectors
(lisp arrays), strings, symbols, and arbitrary precision
numbers.
The run-time system should contain support for reading and
writing these data structures in a variety of formats
(ordinary lisp notation, pretty-printed lisp notation, and
fast binary representations).
Scheme-like data structures, by virtue of their generality
and small number, facilitate many concise coding idioms and
promote good code re-use. I think they are intuitive (it is
easy to picture how they work and to predict performance
characteristics). Such data types should be used more often
than they are. For example, an efficient implementation of
these types, and a clean syntax for writing them, would make
a welcome addition to Java.
This feature does, unfortunately, present a challenge to
debuggers.
* Java-like Types, Cleanly Available to C Programs
Similarly, the run-time system should include Java-like
objects, or better, a simpler object model in which
Java-like objects can be implemented.
I am aware that some work has been done unifying C++ and
Java-like objects. I'm not overly familiar with this work,
but my initial impression was that it is a narrowly focused
solution and that I want something a bit cleaner and more
general. If the existing C++ work is good, then perhaps
it would be compatible with our run-time system.
* Thread Support
The run-time system should be able to take good advantage
of threading on multi-processor machines.
* Unicode Support
The run-time system should have good support for
all of Unicode and presumed future extensions to Unicode.
This doesn't necessarily mean using ICU. ICU puts a lot
emphasis on compatibility with older libraries and on
transcoding, neither of which are especially important to
this project. ICU seems too large and complicated for our
purposes. I have the foundation of a Unicode library
that I think is a better fit.
* Fancy Text Support
The run-time system should have good support for editable,
attributed text. This support should facilitate integration
with Pango or other libraries dedicated to rendering
attributed text.
Typically, such a facility is built into a GUI toolkit,
rather than a C run-time system. I think it makes more
sense to provide this facility at a lower level, tying it to
generic data types rather than widgets, facilitating greater
code re-use, encouraging use in non-graphical applications,
efficiently integrating with the I/O subsystem, etc.
* A Replacement for Stdio
Stdio persists solely because it has momentum, not because
it is a good design. Andrew Hume's I/O library pointed to a
better way to manage buffers (so as to avoid needless
copying of data). My I/O library builds on that idea,
adding support for stackable I/O protocols, and making all
of the interfaces descriptor based rather than file object
based. Having used my library for a few years now (even
though it is not quite finished), I am convinced that there
is no better approach currently available.
* An Exception Mechanism
The run-time system should have high-level support for an
exception mechanism.
* Robust Garbage Collection
The run-time system should include a robust garbage
collector. Nearly every popular, free, implementation of
Scheme and Java that I have seen uses a collector that is at
least partially conservative. These collectors count as GC
roots any value on the C stack that "looks like" a pointer
to valid data.
Such collectors have a serious problem: they leak storage.
It is not difficult to create situations where the collector
wrongly treats some value on the C stack as a GC root --
either because the value looks like a pointer, but isn't, or
because the value was once a pointer, but is now a dead
value that was never overwritten.
Such storage leaks can cause obvious problems, like
processes that are too large, and less obvious problems,
like processes that leak file descriptors. In either case,
long-running programs or life critical programs are poor
candidates for these collectors. I admit that the bugs
actually occur infrequently, and many people use such
collectors happily, but then people do lots of crazy things.
One way to fix such collectors is to modify a C compiler,
such as GCC. It would, perhaps, be nice if GNU C had
support for precise scans of the C stack, and presumably
something along these lines will eventually be implemented
as part of a C# compiler.
Nevertheless, in my opinion, it is a poor idea to tie your
run-time system to a particular compiler.
Another way to fix the problem might be look for a clever
application of C++ features, especially automatic
destructors. Another way would be to use C#. Still another
way would be to use C, but require explicit memory
management by C programs.
My current prejudice is to use C, require explicit
memory management, and to also build a C++ interface,
if one can be built that simplifies programming.
I think that C is a fine language, when used properly,
and that robust GC support need not be too hard to use.
I have heard Emacs hackers complain that explicit GC
management is difficult and error prone. I think this
is easily fixed by a lint-like tool. Some years ago,
I built a lint-like tool that verified some GC-related
invariants in a C implementation of Scheme. Although my
tool was a prototype, it usefully discovered several bugs
that were easily fixed. This approach could be portably
and robustly applied. For convenience and performance,
if there were demand for it, a version of this tool could
be built into GCC.
I'm not yet familiar enough with C# to be comfortable
recommending that, and I'm dubious about the nature of its
origin and the economic politics that surrounds it. These
are very weak objections, of course, so I am open to
persuasion.
* Developed With Continuous Testing and Documentation
The development process should be characterized by
continuous testing, and continuous documentation.
I don't think there's any other sane way to undertake a project
of this scope and complexity.
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- a new run-time for Scheme implementations,
Tom Lord <=