Scheme as a Common Intermediate Language

guile-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Scheme as a Common Intermediate Language

From:	Keisuke Nishida
Subject:	Scheme as a Common Intermediate Language
Date:	23 Oct 2000 13:56:39 -0400
User-agent:	T-gnus/6.14.4 (based on Gnus v5.8.6) (revision 02) SEMI/1.13.7 (Awazu) Chao/1.14.0 (Momoyama) Emacs/20.7 (i686-pc-linux-gnu) MULE/4.1 (AOI)
Hello,

I think I'm proposing the idea of iScheme for my graduate study next
year...  I'm not sure if this works, and I don't know if I will have
enough time to work on this (maybe not), but I'll try anyway...

Kei

%%% ischeme.tex --- Scheme as a Common Intermediate Language
%% Copyright (C) 2000 Keisuke Nishida <address@hidden>

\documentclass[twocolumn]{article}

\title{{\large Research Proposal}\\Scheme as a Common Intermediate Language}
\author{Keisuke Nishida\\\normalsize\texttt{<address@hidden>}}

\setlength{\paperheight}{11in}
\setlength{\paperwidth}{8.5in}
\setlength{\textheight}{9in}
\setlength{\textwidth}{6.5in}
\setlength{\oddsidemargin}{0in}
\setlength{\evensidemargin}{0in}
\setlength{\topmargin}{0in}

\begin{document}
\maketitle

\subsection*{Objective}

To design a common intermediate language into which any high-level
language can be translated.  To write an optimizing compiler and a
run-time engine that efficiently execute the translated code.

\section{Background}

There are numerous programming languages in the world, most of which
have their own compilers and run-time environments.  On the other hand,
the differences between languages are mainly a matter of syntax and
taste of programming paradigm.  Most run-time systems and library
routines, such as a garbage collector, numerical libraries, and I/O
routines, can be shared between languages.

So far the C language has been the common platform of various languages.
Many library routines have been implemented in C, and other language
implementations, such as GCL (GNU Common Lisp) and Perl (a famous
scripting language), have utilized them.  GCC (GNU Compiler Collection)
is a compiler that compiles several languages, including C, C++,
Fortran, and Java, using a common internal parse tree and some shared
run-time libraries (I guess).

Unfortunately, C code requires being compiled.  The C language cannot be
used in the situation where run-time evaluation is desirable, as is done
with \texttt{eval} in Lisp.  There is no common platform for such
high-level languages right now.  Many languages are implementing their
own interpreter, compiler, run-time environments, and library routines.

Today, interactive, scripting, and dynamic languages are gaining in more
and more popularity.  There are Lisp, ML, Perl, Python, Java script,
etc.  Many software applications of today have extensibility by using
one of such languages.  The Emacs text editor, which provides Emacs
Lisp, is a classic example of such an application.

Having a scripting language is especially useful when we use a special
purpose application.  MatLab (a mathematics package), for example, has
its own language in it.  GIMP (a graphic editor) provides a language
that manipulates graphics.  Many scientific/engineering tools come with
specialized languages of their own.  They are all different.

Although there are valid reasons to provide different languages for
different purposes, there are many disadvantages in this situation:
\begin{itemize}
\item It is hard for the user to learn completely different languages
  every time she changes applications.
\item A library routines written in one language cannot be used in
  another language.
\item Implementing the same functionality for each programming language
  is just waste of human resources.
\end{itemize}
This is the reason why there is a demand of having a common platform of
high-level languages.

Creating a single common programming language is not a realistic idea.
What we can share is the internal representation of languages.  A
high-level language can be translated into a common intermediate
language and then executed in a shared run-time environment.  That way
we can share the same resources between languages.

So what platform should it be?  Java can be a candidate of it, as it
comes with the definition of bytecode and a virtual machine.  A language
could be compiled into Java bytecode and share resources.  However, Java
is not designed for such a purpose.  There is a certain limitation in
efficiency.  Moreover, every language is required to be compiled into
bytecode instead of an intermediate code that can be compiled by a
shared compiler.  We want more abstract intermediate representation.

Scheme can be another candidate, as it is a universal language in the
sense that any programming constructs can be implemented in terms of
Scheme primitives.  A language could be translated into Scheme and then
evaluated.  However, implementing programming constructs on the top of
Scheme is too inefficient when we have a more efficient way to do it.
A program translated into the common representation should be executed
at least as efficient as that by the native interpreter of the language.
We want more efficiently executable representation.

Thus, the representation of our goal must be as expressive as Scheme,
while more efficient than Scheme for typical operations.  That way a
high-level language can be satisfactorily translated into the language
of a common platform.

\section[Introduction]{Introduction\footnote
{After this section, I'm writing as if my project is already exist, but
it isn't (except the virtual machine).  I want to design something that
I wrote in this paper.  Many ideas are very rough and to be completed.}}

iScheme is an intermediate language for the sake of inter-language
communication.  Any high-level language can be translated into iScheme
and executed efficiently.  iScheme provides a common platform of
interactive, scripting, and dynamic languages.  Languages transfered
into iScheme may share an optimizing compiler, run-time environment, and
library routines.

iScheme is an intermediate language, not a programming language; that
is, iScheme is not to be used by humans but by compilers.  A program
written in one language is first translated into iScheme, and then
executed by an iScheme implementation.  iScheme can be considered to be
a parse tree shared among languages.  There are several reasons why this
is desirable:
\begin{itemize}
\item Writing a translator from a target language into iScheme is easier
than writing a whole compiler and a run-time environment for that language.

\item Every language can share the same resources, including an
optimizing compiler, run-time environment, and library routines.  By
sharing the core system, inter-language operability is also feasible
without loosing efficiency.

\item iScheme can be executed in a variety of ways.  It can be evaluated
by an iScheme interpreter.  It can be translated into another language,
such as C or Scheme.  It can be compiled into bytecode and ran on a
virtual machine, such as Java VM or Guile VM.

\item A translated code can be easily modified and tested, since iScheme
is still a complete programming language.  This helps debugging.
\end{itemize}

As the name implies, iScheme is based on the Scheme language.  iScheme
is designed to be implemented on the top of the standard Scheme.
However, iScheme is neither a subset nor a superset of Scheme.  iScheme
has a different set of primitives and data types than Scheme and hence
is a different language.  Like Scheme, iScheme provides all essential
programming constructs, including closures and continuations.  Unlike
Scheme, iScheme provides numerous programming constructs, including
non-local exits and dynamic bindings.

Every iScheme primitives are implemented in terms of Scheme primitives.
An iScheme interpreter can be easily implemented in Scheme.  iScheme
primitives, however, need not be implemented that way.  An iScheme
compiler may compile iScheme primitives in the most efficient way.  This
way many of typical programming languages can be translated into iScheme
without loosing efficiency.

Because of the nature of iScheme as an intermediate language, iScheme
does not provide any syntax extension features, such as read macro and
macro expansion.  Those features must be implemented by translators from
target languages to iScheme.  Thus, Scheme is also to be translated into
iScheme.  More likely, iScheme code is compiled into bytecode and
executed by a virtual machine.

The target languages of iScheme are interactive, scripting, and dynamic
languages.  Although it is possible to compile iScheme code into machine
code, it is not the main scope of iScheme.  iScheme focuses on run-time
environment.  iScheme code can be dynamically generated in an arbitrary
run-time environment and executed immediately.  By specializing the
objective this way, iScheme executes such a language most efficiently.

One of the most distinguishing features of iScheme is inter-language
operability.  iScheme allows different languages to exchange objects
under a certain consensus if they want.  A separate documentation
``iScheme Language Standard'' describes how to design a new language,
and ``iScheme Library Standard'' explains what library functions are
expected to be implemented.  By following those standards, programs gain
maximum inter-language operability and code sharing.

\section{Overview of iScheme}

iScheme tries to allow any programming language to be translated.
iScheme provides both lexical scoping and dynamic scoping.  iScheme
allows both statically typed and dynamically typed variables.  iScheme
accommodates functional programming, imperative programming,
object-oriented programming, and so forth.

Too much functionality makes things \emph{bad}.  A language designer has
a responsibility of keeping her language clean.  iScheme has a
responsibility of keeping its organization well established.  Therefore,
all iScheme primitives are implemented in terms of Scheme so that a good
consistency is accomplished.  iScheme provides a standard interpreter
written in Scheme.  This way iScheme tries to keep its soundness,
correctness, and portability.

In practice, however, it is not so easy or too inefficient to implement
all of iScheme's features in Scheme.  Full objective-oriented
programming (OOP) support is such an example.  Instead of implementing
all of them in Scheme, iScheme divides itself into several parts: Core
iScheme, which is the basis of iScheme, and iScheme extensions, which
includes OOP support and multithreading.  Only Core iScheme is actually
implemented in Scheme, and others are implementation dependent.

This section gives an overview of iScheme compared to the standard
Scheme.  Section \ref{sec:general} describes the general topics about
iScheme.  Section \ref{sec:core} explains Core iScheme.  Section
\ref{sec:oop} and \ref{sec:mt} give introductions to iScheme's OOP and
multithreading features, respectively.

\subsection{General Topics}
\label{sec:general}

\subsubsection{Syntax and semantics}

The syntax of iScheme is exactly the same as that of Scheme.  An iScheme
program can be read by Scheme's \texttt{read} procedure.  On the other
hand, its semantics is somehow different.  iScheme provides a completely
different set of primitives than Scheme, many of which include
non-standard Scheme features, such as dynamic binding.

\subsubsection{Notation}

Although iScheme uses the same identifiers as Scheme, the meaning of
them and how to evaluate them can be different.  iScheme gives special
meanings to some identifiers.

All iScheme primitives begins with a letter address@hidden'.  Prefix
`\texttt{:}' is used for keywords, while prefix `\texttt{<}' is reserved
for type names.  Other names are variables that a target language may
use.  Table \ref{tab:notation} gives some examples.

\begin{table}[htbp]
  \begin{center}
    \begin{tabular}{l|l}
      Notation & Meaning \\
      \hline
      address@hidden & Primitive \\
      \texttt{:foo} & Keyword \\
      \texttt{<foo>} & Type \\
      \texttt{foo} & Variable \\
    \end{tabular}
    \caption{Notations and meanings of identifiers}
    \label{tab:notation}
  \end{center}
\end{table}

Since various languages can be translated into iScheme, name conflict
between those languages and iScheme must be avoided.  Target languages
are free to use those identifiers that are not preserved by iScheme.  If
a language wants to use the same notation for different meaning, it must
be somehow encoded and decoded by the translator.

\subsubsection{Primitive and library}

Many primitives of Scheme are included in iScheme, but not everything.
If a language wants to use some library functions that are not part of
iScheme primitives, the translator of the language needs to include
those functions in its own name space with some preferred names.

\subsection{Core iScheme}
\label{sec:core}

Core iScheme is the heart of the iScheme language.  It provides the
following features:
\begin{itemize}
\item Generalized operations
\item Function parameters
\item First-class type
\item Static/dynamic typing
\item Lexical/dynamic binding
\item Flow control primitives
\item Module system primitives
\item Multiple top-level environments
\end{itemize}

\subsubsection{Generalized operations}

iScheme supports generalized set operator and object application.  For
example, consider the following Scheme code:
\begin{quote}
\begin{verbatim}
(vector-ref v 0)
(vector-set! v 0 1)
\end{verbatim}
\end{quote}
This can be written in iScheme as follows:
\begin{quote}
\begin{verbatim}
(v 0)           ;; object application
(@set! (v 0) 1) ;; generalized set
\end{verbatim}
\end{quote}

It is obvious that the latter is more straightforward as a translation
of the following C-like code:
\begin{quote}
\begin{verbatim}
v[0];
v[0] = 1;
\end{verbatim}
\end{quote}

The above operations can be implemented in terms of Scheme by applying
the following transformation:
\begin{quote}
\begin{verbatim}
(v 0)
    ->  ((@getter (@type v)) v 0)
(@set! (v 0) 1)
    ->  ((@setter (@type v)) v 0 1)
\end{verbatim}
\end{quote}

Primitive address@hidden returns the type of an object, as described
later.  Primitive address@hidden and address@hidden return the
getter and setter procedures of that type, respectively.  Thus,
\texttt{(@getter (@type v))} returns the procedure \texttt{vector-ref},
and \texttt{(@setter (@type v))} returns the procedure
\texttt{vector-set!}, as we expect.

Although the above transformation costs very much, these operations can
be efficiently implemented with native supports by the iScheme
implementation.  So these are part of the iScheme primitives.

\subsubsection{Function parameters}

iScheme allows Common Lisp-like function parameters.  Three keywords
\texttt{:optional}, \texttt{:rest}, and \texttt{:key} can be used for
this:
\begin{quote}
\begin{verbatim}
(@define (foo arg :optional opt
                  :rest rest
                  :key key)
  (list arg opt rest key))

(foo 1 2 3 :key 4)
    => (1 2 (3 :key 4) 4)
\end{verbatim}
\end{quote}

Unlike Common Lisp, the default value of parameter variables cannot be
specified syntactically.  Instead, iScheme provides a primitive
address@hidden, which can be used as follows:
\begin{quote}
\begin{verbatim}
(@define (foo arg :optional opt)
  (@set-default! opt 2)
  (list arg opt))

(foo 1)                 => (1 2)
(foo 1 3)               => (1 3)
\end{verbatim}
\end{quote}

Unlike Scheme, iScheme does not allow dot-notation for rest arguments.
A \texttt{:rest} parameter must be explicitly specified.

\subsubsection{First-class type}

In iScheme, types are first-class objects.  Primitive address@hidden
returns the type of an object:
\begin{quote}
\begin{verbatim}
(@type 1)               => integer type
\end{verbatim}
\end{quote}

Most types have bindings with some variables.  The integer type above is
bound to variable \texttt{<integer>}.

A type is applicable.  If a type is called with an object, it coerces
the object to be of that type, if possible:
\begin{quote}
\begin{verbatim}
(<integer> "1")         => 1
\end{verbatim}
\end{quote}

Primitive address@hidden constructs an object of a certain type.  A new
type can be created as an object of \texttt{<type>} type:
\begin{quote}
\begin{verbatim}
(@define <foo> (@make <type>))
\end{verbatim}
\end{quote}

As we have seen before, a type can coerce an object to be of that type.
Also, an object of that type can be applicable and settable.  Procedures
for those operations can be passed as keyword arguments:
\begin{quote}
\begin{verbatim}
(@define (coerce-foo x) 'foo)
(@define (get-foo o) 'foo)
(@define (set-foo o v) 'foo)

(@define <foo>
  (@make <type> :coercer coerce-foo
                :getter get-foo
                :setter set-foo))

(<foo> 1)               => foo

(@define foo (@make <foo>))
(foo)                   => foo
(@set! (foo) 1)         => foo
\end{verbatim}
\end{quote}

\subsubsection{Structure data type}

A simple structure can be created in terms of type creation.  Each type
may have object slots, specified by keyword \texttt{:slots}.  Each slot
can be accessed by primitive address@hidden  The following example
illustrates how to create a simple counter type:
\begin{quote}
\begin{verbatim}
(@define (counter-ref c)
  (@slot c 'count))

(@define (counter-set! c n)
  (@set! (@slot c 'count) n))

(@define (counter-count c)
  (@let ((n (@1+ (counter-ref c))))
    (counter-set! c n)
    n))

(@define <counter>
  (@make <type> :slots '(count)
                :getter counter-count
                :setter counter-set!))

(@define counter
  (@make <counter> :count 0))

(counter-ref counter)   => 0
(@set! (counter) 2)
(counter)               => 3
(counter)               => 4
\end{verbatim}
\end{quote}

More advanced ways of defining a type, or class, is provided by
iScheme/OOP extension (section \ref{sec:oop}).

\subsubsection{Static/dynamic typing}

iScheme supports both statically typed languages and dynamically typed
languages.  iScheme is based on a version of typed lambda calculus that
allows free type.  By not specifying any types, iScheme can be
dynamically typed.  Types can be specified in the following way:
\begin{quote}
\begin{verbatim}
(@lambda <char> ((<string> s))
  (s 0))
\end{verbatim}
\end{quote}
This creates a function that takes a string argument and returns a
character.

An iScheme compiler may compile typed code efficiently.  In the above
code, for example, since the variable \texttt{s} is already known to be
a string, the code \texttt{(s 0)} can be converted into
\texttt{(string-ref s 0)} at compile time.

If a variable is type-free, as is often the case, its type is checked at
run time.  However, once a function is known to be called with a certain
type of object, the function can be dynamically recompiled for that
particular type.  iScheme/OOP provides the basis of such an advanced
type checking and dynamic type dispatch.

\subsubsection{Lexical/dynamic binding}

iScheme allows both lexical binding and dynamic binding.  The standard
mode is lexical binding.  iScheme requires a special syntax for dynamic
binding.

A dynamic binding is created by primitive address@hidden  The
value of a dynamic variable can be referred as usual in a lexical scope:
\begin{quote}
\begin{verbatim}
(@dynamic-let ((foo 1))
  foo)                  => 1
\end{verbatim}
\end{quote}

Primitive address@hidden can be used in place where dynamic reference
is needed:
\begin{quote}
\begin{verbatim}
(@define (bar)
  (@dynamic 'foo))

(@dynamic-let ((foo 1))
  (bar))                => 1
\end{verbatim}
\end{quote}

Alternatively, iScheme also provides address@hidden, which can be
used to emulate dynamic binding:
\begin{quote}
\begin{verbatim}
(@define foo #f)
(@define (bar) foo)

(@fluid-let ((foo 1))
  (bar))                => 1
\end{verbatim}
\end{quote}

[ I guess \texttt{fluid-let} is not thread safe.  Is that correct? ]

\subsubsection{Flow control primitives}

iScheme provides a number of flow control primitives, such as
address@hidden, address@hidden, address@hidden, address@hidden,
address@hidden/cc}, and exception handling.  I have not thought out about
this yet, but it should be done.

\subsubsection{Module system primitives}

iScheme does not provide a single module (or package, etc.) system, but
instead, provides basic components upon which any module systems can be
constructed.

A date type \texttt{<name-space>} provides a single name-space which
keeps bindings from symbols to locations.  A data type \texttt{<location>}
provides a location where an object is stored.  Both are applicable and
settable:
\begin{quote}
\begin{verbatim}
(@define n (@make <name-space>))
(@define l (@make <location>))

(@set! (l) "Hello")     ;; set the value
(@set! (n 'foo) l)      ;; create binding

(n 'foo)                => location `l'
((n 'foo))              => "Hello"

(@set! ((n 'foo)) "World")
((n 'foo))              => "World"
\end{verbatim}
\end{quote}

By combining several name-spaces and defining how to solve bindings, a
module can be created.  The following example illustrates how to create
a simple module, which has only a single name-space:
\begin{quote}
\begin{verbatim}
(@define (simple-module-ref m s)
  (@let* ((nss (@slot m 'name-spaces))
          (ns (@car nss))
          (loc (ns s)))
    (@if loc
         (loc)
         (@error "Unbound: ~A" s))))

(@define (simple-module-set! m s v)
  (@let* ((nss (@slot m 'name-spaces))
          (ns (@car nss))
          (loc (ns s)))
    (@unless loc
      (@set! loc (@make <location>))
      (@set! (ns s) loc))
    (@set! (loc) v)))

(@define (make-simple-module)
  (@let ((ns (@make <name-space>)))
    (@make <module>
           :name-spaces (@list ns)
           :getter simple-module-ref
           :setter simple-module-set!)))
\end{verbatim}
\end{quote}

Now, this module is also applicable and settable:
\begin{quote}
\begin{verbatim}
(@define module (make-simple-module))
(@set! (module 'foo) 1)
(module 'foo)           => 1
(module 'bar)           ;; ERROR
\end{verbatim}
\end{quote}

Although it is not required to use, iScheme provides a standard module
system, which is described in ``iScheme Language Standard'' in detail.

\subsubsection{Module hierarchy}

Modules can be placed in hierarchy.  The standard way of doing this is
to use primitive address@hidden, which returns a module under
which every other module is expected to be placed.  The following
example creates three levels of modules:
\begin{quote}
\begin{verbatim}
(@let ((top (@top-module))
       (parent (make-simple-module))
       (child (make-simple-module)))
  (@set! (top 'parent) parent)
  (@set! (parent 'child) child))
\end{verbatim}
\end{quote}

Now, a variable \texttt{foo} defined in the child module can be accessed
as follows:
\begin{quote}
\begin{verbatim}
((((@top-module) 'parent) 'child) 'foo)
\end{verbatim}
\end{quote}

To make this simpler, primitive address@hidden allows the following
notation:
\begin{quote}
\begin{verbatim}
(@module '(parent child foo))
\end{verbatim}
\end{quote}

This primitive assumes module names are placed in the same name space
as variable names.  If a language prefers not to do that way, the
translator must construct its own module accessor.

As always, how modules should be organized in a standard way is
described in ``iScheme Language Standard.''

\subsubsection{Multiple top-level environments}

In iScheme, the top-level environment is a single module object.  In
other words, multiple top-level environments can be realized by creating
several module objects.

Primitive address@hidden provides the access to the current
top-level environment:
\begin{quote}
\begin{verbatim}
(@set! (@module '(m1))
       (make-simple-module))
(@set! (@module '(m2))
       (make-simple-module))

(@set! (@current-module) (@module '(m1)))
(@define foo 1)

(@set! (@current-module) (@module '(m2)))
(@define foo 2)

(@set! (@current-module) (@module '(m1)))
foo                     => 1

(@set! (@current-module) (@module '(m2)))
foo                     => 2
\end{verbatim}
\end{quote}

\subsection{iScheme/OOP}
\label{sec:oop}

iScheme/OOP defines an object-oriented programming extension to Core
iScheme.  It provides CLOS-like OOP primitives based on the Metaobject
protocol.  Every type in Core iScheme is redefined in terms of classes.
iScheme/OOP provides the following features in addition to Core iScheme:
\begin{itemize}
\item Classes, generic functions, methods
\item Dynamic type dispatch
\item Multiple inheritance
\end{itemize}

\subsection{iScheme/MT}
\label{sec:mt}

iScheme/MT defines a multithreading extension to Core iScheme.  It
redefines some primitives like address@hidden so as to be
thread-safe.  It provides several new primitives for multithreading.

\section{Implementation}

I am going to implement an iScheme interpreter in Scheme, but it is not
sufficient because iScheme also must be executed very fast.  Therefore,
I am going to write an optimizing compiler and a virtual machine as well
for fast execution.

There is already a project Guile, which has aimed at translating various
languages into Scheme.  I have written a virtual machine for Guile, so I
will specialize it for iScheme.  Since Guile already has a good OOP
system and multithreading support, the whole iScheme features can be
implemented on the top of it.

Since my goal is to share a common language base between several
languages, it can be accomplished by contributing to the free software
community.

\end{document}
[Prev in Thread]
Current Thread
[Next in Thread]
Scheme as a Common Intermediate Language, Keisuke Nishida <=
- Re: Scheme as a Common Intermediate Language, Carl R. Witty, 2000/10/23
  - Re: Scheme as a Common Intermediate Language, Keisuke Nishida, 2000/10/24
Prev by Date: Re: Guile reference manual questions
Next by Date: any clues?
Previous by thread: [patch] removal of dnl comments in configure.in
Next by thread: Re: Scheme as a Common Intermediate Language
Index(es):
- Date
- Thread