mit-scheme-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [MIT-Scheme-devel] sqlite3


From: Matt Birkholz
Subject: Re: [MIT-Scheme-devel] sqlite3
Date: Sun, 19 May 2013 15:02:21 -0700

> From: Taylor R Campbell <address@hidden>
> Date: Sun, 19 May 2013 17:08:52 +0000
> 
> [...]
> 
> For microcode primitives, it is not a priori the case that
> interrupts are disabled on entry.

Le Machine is in what state?  Not "in a callout"?  It is not up and
running, creating callback tokens and making callouts to register them
with the toolkit?  You'll have to lay a usage scenario on me, bro.
And explain why we are talking about "microcode primitives" now, and
not "callbacks".

If you want to energize Le Machine (initted with a disk-save
continuation?) from a call"back" with no callout, that's... wild.
Good luck.  I hope I can follow all that AND the thing that I
can't even spell: lugjmp?

> I will admit that I haven't looked very closely at your FFI's
> implementation.  The magic happening in the macros is very hard to
> follow (C-INCLUDE, for example, expands to nothing -- it seems that
> you are abusing macros for side effects rather than expansion),

I needed the info at syntax-time, and had no use for it at run-time.

Most of the magic simplifies C type declarations (using the entire set
of included declarations) to determine whether to use c-peek-char or
c-peek-pointer when the programmer has only said "GSList->next".

> and using it without installing things in
> $PREFIX/lib/mit-scheme-$ARCH doesn't seem to be supported.

You can install shims in the first (existing) directory on your
MITSCHEME_LIBRARY_PATH, i.e. anywhere that exists.  I thought I said
something like that in the manual, but now I can't find it.  Maybe
I'll just twiddle the example to install in $HOME/.scheme-9.1/...

I decided not to search all of MITSCHEME_LIBRARY_PATH to make it hard
to load inconsistent -const.bin and -shim.so files (from different
directories along the path), but I could change that, especially if I
had some code to check a hash tag or sump'n.

> It's also not clear to me why Scheme needs to memorize so much
> information about the C platform's ABI

Constants, sizes and offsets are hard-coded -- "filled in".  I don't
want to callout to e.g. Scm_gslist_next just to see if I've reached
the end of a Glib list.  I want to peek instead, something that
could be inlined into a few instructions, NOT something that
requires multiple insults to the hallowed CPU pipeline (obligatory
interjection: "All hail the pipeline!") like register flipping and
stack switching.

> (and the grovelling mechanism will get in the way of any attempt at
> cross-compilation), when you're already generating C code for the
> shims.

I didn't notice any problems while cross-compiling.  My Gtk interface
works in i386, x86_64 and (32 *and* 64 bit) C.  ?  Geez, YOU
recommended the grovelling mechanism to me (in 2006)!

> From: Taylor R Campbell <address@hidden>
> Date: Thu, 10 Aug 2006 20:24:35 +0000
> 
> [...]
> Other Lisp FFI utilities, such as sb-grovel[1], cffi-grovel[2], and
> s48-grovel[3], (hmm, notice a trend here?), take this approach:
> generate C code to generate Lisp code with the appropriate values
> filled in.

I have some developer-level documentation that has grown stale over
the years, but most of the following is still accurate.  Perhaps it
can help.  You might just skip to "@node C FFI Callbacks"...

@node C FFI, Gtk, Microcode, Top
@chapter The C Libraries' Foreign Function Interface

@insertcopying
@end ifnottex

@menu
* C FFI Modifications:: Changes to the stock Scheme machine.
* C FFI Callouts:: Details of the code generated for callout trampolines.
* C FFI Callbacks:: Details of the code generated for callback trampolines.
* C FFI Build:: Building the microcode.  Installing the Scheme code.
@end menu

This chapter describes how to add a Foreign Function Interface to an
MIT/GNU Scheme v7.7.90+ build.  It also provides an overview of the
implementation of the FFI, including especially the callout and
callback trampolines that are generated.  It is assumed the reader is
familiar with the FFI at the user level.
@c @xref{Top,, Introduction, mit-scheme-ffi, FFI Users' Manual}.
@c In HTML, I see "See Introduction." with "Introduction" linked to
@c http://birkholz.chandler.az.us/~matt/Scheme/FFI/mit-scheme-ffi.html#Top
@c In Info, I see "*Note Introduction:" followed by "(mit-scheme-ffi)Top".
@xref{Top,, The FFI Users' Manual, mit-scheme-ffi, The FFI Users' Manual}.

Most of the FFI code is concerned with loading and analyzing the C
type information, the @file{.cdecl} files.  The resulting data
structure (a c-includes record) is used by the code generator and the
syntax expanders.  It contains indices of the declared C types,
constants and the @code{alien-function} address caches.  Toolkit data
addresses (aliens) are the only other runtime object.  The c-includes
record is not needed once @code{c-generate} has been run and all
syntax expanded.

[organization of source code files/packages]

The rest of this section looks at the FFI's data types and its
groveler.  Subsequent sections discuss the modifications to the Scheme
machine, the callout and callback trampolines, and how to build the
entire system.

@section Runtime Objects

@strong{Aliens} are Scheme wrappers for C data structures.  Each
contains a memory address split into two fixnum halves.  An alien may
have a C type description attached, for debugging purposes or perhaps
some future runtime type checking facility.

@strong{Alien functions} are used by the @code{C-call} syntax to cache
trampoline entry addresses.  They are implemented by a named vector
type so that they can be fasdump/loaded.  Some attempt is made to
share these objects among multiple @code{C-call} syntax expansions.

The cached entry addresses are only valid during the current process,
so each alien function includes a @code{band-id} member.  When the
(possibly fasloaded) band ID does not match the current band's ID, the
cache is invalid.  The runtime system's @code{dld-*} procedures are
used to fill the cache (on demand).

@section Syntax Time

@strong{Cdecls} are expressions found in @file{.cdecl} files.  They
are read by the @code{include-cdecls} procedure and assembled into a
@code{c-includes} data structure.  @strong{Ctypes} are the validated
cdecls found in the @code{c-includes} structure.  They are examined
using a set of abstract procedures.  An example of each expression is
given below with the procedure that recognizes it.

@multitable address@hidden(struct Name (Member ctype)...)}} 
{ctype/struct-named?}
@item @code{char}
@tab ctype/basic?
@item @code{(const char)}
@tab ctype/const?
@item @code{(* char)}
@tab ctype/pointer?
@item @code{(struct Name)}
@tab ctype/struct-name?
@item @code{(struct (Member ctype)...)}
@tab ctype/struct-anon?
@item @code{(struct Name (Member ctype)...)}
@tab ctype/struct-named?
@item @code{(union  Name)}
@tab ctype/union-name?
@item @code{(union  (Member ctype)...)}
@tab ctype/union-anon?
@item @code{(union  Name (Member ctype)...)}
@tab ctype/union-named?
@item @code{(enum   Name)}
@tab ctype/enum-name?
@item @code{(enum (Member)...)}
@tab ctype/enum-anon?
@item @code{(enum Name (Member)...)}
@tab ctype/enum-named?
@end multitable

Note that the target types of pointer types are not currently
validated.

@section Groveler

The @code{c-generate} procedure reads a @address@hidden (and
included) file(s) and writes three new ones.

@table @file
@item @i{library}.c
gets the callout and callback trampolines.
@item @i{library}-types.bin
gets a fasdump of the @code{c-includes} structure @emph{without} the
@code{enum-values} and @code{struct-values} members.  These are loaded
from the @address@hidden file generated by the
groveler.
@item @i{library}-constants.c
gets the groveler.
@end table

The groveler is the C program that outputs C constants in Scheme
syntax.  It generates a @address@hidden file that can
be (fas)loaded by the @code{C-include} syntax.  The @file{.scm} file
should contain a list of two things.  The first is an association list
of enum constant values indexed by constant name.  The second contains
the sizeof a C @code{struct} type and the offset and type of each
struct member.  This information is repeated for any aliases.  For
example, these two Cdecls

@example
(struct A (B int) (C int))
(typedef D (struct A))
@end example

produce the following list of struct values.

@example
((sizeof (struct |A|)) . 8)
((offset (struct |A|) |B|) . 0)
((offset (struct |A|) |C|) . 4)
((sizeof |D|) . 8)
((offset |D| |A|) . 0)
((offset |D| |B|) . 4)
@end example

The @code{C-include} syntax loads a @code{c-includes} structure
from a @address@hidden file and adds to it the enum and
struct values loaded from a @address@hidden file.



@node C FFI Modifications, C FFI Callouts, C FFI Overview, Top
@section Modifications

This FFI adds several new primitives to the Scheme machine.  These can
be found in the @file{pruxffi.c} and @file{pruxffi.h} files.  It also
requires a few changes to the @code{Interpret()} function itself,
adding an argument, support for two new aborts, and a
@code{callback-handler} slot in the fixed objects vector.  The
complete set of patches can be found in the @file{microcode.patch} file,
which modifies the following files.

@itemize @bullet

@item @file{Makefile.in}
Primarily adds a rule for the new @file{pruxffi.o} object.  Several
other changes support the @file{prhello} example.

@item @file{boot.c}
The C data stack (@code{callout_obstack}) is initialized, e.g. next to
the initialization of @code{scratch_obstack}.  Also, the
@code{Interpret} function's old @code{pop_return_p} parameter is back.

@item @file{configure} and @file{configure.ac}
Adds the @file{pruxffi} module whenever @file{pruxdld} is available.
Changing @file{configure} as well as @file{configure.ac} means you
will not need to run @code{autoconf}.

@item @file{const.h}
Add @code{PRIM_RETURN_TO_C} and @code{PRIM_ABORT_TO_C}, two new
ways of exiting the interpreter and leaving it ready for re-entry via
@code{Interpret(1)}.

@item @file{extern.h}
Add declarations for @code{callout_obstack} and
@code{find_primitive_cname}.  Modify the declaration of
@code{Interpret}.

@item @file{fixobj.h}
Add a @code{callback-handler} slot to the fixed objects vector.

@item @file{interp.c}
Add a @code{pop_return_p} parameter to @code{Interpret}.
Implement the new @code{PRIM_RETURN_TO_C} and
@code{PRIM_ABORT_TO_C} aborts.

@item @file{primutl.c}
A @code{find_primitive_cname} function is needed.  There is a similar
function, @code{find_primitive} that takes a Scheme string.  A few
modifications turn it into @code{find_primitive_cname}, in terms of
which @code{find_primitive} is easily re-implemented.

@item @file{utabmd.scm}
Add the @code{callback-handler} slot.

@end itemize

@heading @file{pruxffi.c}

This file extends the microcode with the following primitives and
functions.

@itemize @bullet

@item
The @code{c-peek-} and @code{c-poke-} primitives for each of the basic
C types.

@item
The @code{c-peek-cstring} and @code{c-peek-cstringp} primitives help
deal with the ubiquitous, null-terminated @code{* char} data.

@item
Utility primitives @code{c-malloc} and @code{c-free}.

@item
The callout primitives @code{c-call} and @code{c-call-continue}.

@item
The callback primitives @code{run-callback} and @code{return-to-c}.

@item
Functions referenced by the generated trampolines, like
@code{callout_continue} and @code{Setup_Callback}.

@end itemize

@heading @file{pruxffi.h}

The @file{pruxffi.h} file includes macros that implement a C data
stack, @verb{"CStack"}, abstraction with methods like
@verb{"CStack_Push"}, @verb{"CStack_LPop"} and
@verb{"CStack_Pop_Frame"}.  The push method is used in the first half
of callout trampolines to save return values from the C library.  The
pop methods are used in the second half while converting the C values
to Scheme values.  The abstraction is implemented on an obstack,
@verb{"callout_obstack"}, used simply as an automatically growing
contiguous memory segment (with base and top pointers).  The
implementation never uses @verb{"obstack_finish"} --- just
@verb{"obstack_grow"}.


@node C FFI Callouts, Callbacks, Modifications, C FFI
@section Callouts

Callout trampolines are split into two parts.  The first part
is run by the @code{call-c} primitive.  It converts the Scheme
arguments and calls the C function, saving the returned value on a C
data stack.  Then it arranges for the second part to run by hacking
its continuation and aborting.  The hack substitutes the
@code{call-c-continue} primitive for @code{call-c} in the primitive
apply frame at the top of the Scheme stack.  The abort causes
the interpreter to retry the primitive application, this time applying
@code{call-c-continue}.

The second part, run by the @code{call-c-continue} primitive, pops the
C function's return value off the C data stack and conses the
corresponding Scheme return value.  The pop is delayed until all
consing is complete, making this part restartable after a GC abort.
If the consing does abort for GC, any heap addresses used in the first
part of the trampoline (during argument marshalling) will be
invalidated, but this second part (return value consing) does not use
(actually has no access to) these invalid pointers.  Once the Scheme
value is successfully constructed, the @code{call-c-continue}
primitive can return ``normally'', as though from the call to
@code{call-c}.

For each @code{extern} cdecl, e.g.
@smallexample
    (extern (* GtkWidget)
            gtk_window_new
            (type GtkWindowType))
@end smallexample
the @code{gen-callout-trampolines} procedure generates a two-part
callout trampoline.  The first part might look like this.

@verbatim
void
Scm_gtk_window_new (void)
{
  /* Declare C args and return value. */
  GtkWidget * ret0;
  GtkWindowType type;

  /* Init C args.  Aborts are OK; they will restart this function. */
  if (GET_LEXPR_ACTUALS < 3) {
    signal_error_from_primitive (ERR_WRONG_NUMBER_OF_ARGUMENTS); }
  type = arg_integer (3);

  /* Call the C function, but first swap c-call-continue for c-call and
     back out of the primitive.  No more aborts! */
  prepare_callout_continuation ();
  ret0 = gtk_window_new (type);
  prepare_for_callout_results ();

  /* Save C return value. */
  CStack_Push (GtkWidget *, ret0);

  callout_continue (&Scm_continue_gtk_window_new);
  /* NOTREACHED */
}
@end verbatim

The matching second part might look like this.

@verbatim
SCHEME_OBJECT
Scm_continue_gtk_window_new (void)
{
  /* Declare. */
  char * tos0;
  GtkWidget * ret0;
  SCHEME_OBJECT ret0a;

  /* Restore. */
  CStack_top_of_results (tos0);
  CStack_LPop (GtkWidget *, ret0, tos0);

  /* Return. */
  set_alien_address (ARG_REF (2), (void*)ret0);
  ret0a = UNSPECIFIC;
  pop_callout (tos0);
  return (ret0a);
}
@end verbatim

The above example does not actually cons in the second part, but it
easily could with something as simple as @code{long_to_integer}.

The @code{c-call-continue} primitive must manage the C data stack
carefully to stay restartable.  It decrements a local top-of-stack
pointer while popping the C results.  It does not actually pop the
frame off the stack until it has successfully consed all the results.

The callout trampolines are GC abortable and restartable.  They do not
hold onto pointers into the Scheme stack.  After an abort, they load
their arguments again from the freshly-GCed Scheme stack.


@node C FFI Callbacks, C FFI Build, C FFI Callouts, C FFI
@section Callbacks

Callback trampolines are also split into two parts, to
accommodate GC aborts.  The first part is registered with the toolkit,
and runs outside the interpreter --- no consing --- no GC aborting.
It calls @code{Interpret(1)} after hacking the Scheme stack like an
interrupt.  It pushes a couple zero-arity primitive application
frames.  The first applies the @code{return-to-c} primitive and the
second applies @code{run-callback}.

The second part is run in the interpreter by the @code{run-callback}
primitive.  It conses the callback arguments and applies the Scheme
callback handler (from the fixed objects array).  It is restartable,
to accommodate GC aborts during construction of the arguments.  When
finished, it leaves the callback's return value in the value register.
The interpreter then applies @code{return-to-c} and control returns to
the first part of the callback trampoline, which converts the Scheme
value register and returns an equivalent C value to the toolkit.

For each callback cdecl, e.g.
@smallexample
        (callback gint
                  delete_event
                  (window (* GtkWidget))
                  (event (* GdkEventAny))
                  (ID gpointer))
@end smallexample
the @code{gen-callback-trampolines} procedure generates a callback
trampoline and a restartable kernel.  The trampoline for the above
declaration should look something like this.

@verbatim
void
Scm_clicked (GtkWidget * widget, gpointer ID)
{
  Start_Callback ();
  CStack_Push (gpointer, ID);
  CStack_Push (GtkWidget *, widget);
  Run_Callback ((uint)ID, (CallbackKernel)&Scm_kernel_clicked);
  return;
}
@end verbatim

The corresponding kernel looks like this.

@verbatim
static void
Scm_kernel_clicked (void)
{
  /* Declare. */
  GtkWidget * widget;
  gpointer ID;
  SCHEME_OBJECT alien0;
  SCHEME_OBJECT arglist0;
  char * tos0;

  /* Init. */
  tos0 = CStack_TOS ();
  CStack_LPop_Kernel_Check (&Scm_kernel_clicked, tos0);
  CStack_LPop (GtkWidget *, widget, tos0);
  CStack_LPop (gpointer, ID, tos0);
  arglist0 = EMPTY_LIST;

  /* Construct. */
  alien0 = cons_alien ((void*)widget);
  arglist0 = cons (alien0, arglist0);
  Setup_Callback ((uint)ID, arglist0);

  CStack_Pop_Frame (tos0);
  PRIMITIVE_ABORT (PRIM_APPLY);
}
@end verbatim

The Scheme callback handler looks up the registered closure and
runs the closure without
preemption, returning from the @code{run-callback} primitive with a
Scheme value.  The interpreter continues with the application
of the @code{return-to-c} primitive, which immediately returns from
@code{Interpret()}.  The trampoline can then convert the Scheme
value register to a C value and return it to the toolkit.

@heading Callouts during callbacks during callouts...

Callbacks usually arrive during a callout.  The first part of the
callout trampoline is careful to ``canonicalize the interpreter
context'' before calling out, so that the Scheme stack and registers
are in a GCable state.  The callout tramp. can call the toolkit, the
toolkit can call a callback tramp., and the callback tramp. can push
its interrupt frame and @emph{recursively} enter @code{Interpret()}.  Once
inside the interpreter, with complete frames on the stack, GC aborts
can be handled as callback arguments are consed.

During a callback the toolkit is blocked waiting for
@code{Interpret()} to execute the @code{return-to-c} primitive.  It is
possible for MIT Scheme to switch threads and (perhaps permanently!)
abandon that continuation.  The generic callback handler arranges for
the current thread to run without preemption until the Scheme callback
procedure returns.  If an error is signaled in a callback, the
standard error handler is invoked, and the error REPL can be used to
debug the situation with the toolkit blocked.

Here is the Scheme debugger's forward-trace (continuation trace)
from a breakpoint in a callback of the example ``Hello, World!''
program.  It shows the @code{return-to-c} primitive apply frame
which waits to return values to the toolkit, and after that, a
@code{c-call-continue} primitive apply frame, which continues with the
reduction of a callout to @code{gtk_main}.

@verbatim

; hello::clicked #[alien 2 #f 0x08112eb8]

 clicked
;To continue, call RESTART with an option number:
; (RESTART 2) => Return from BKPT.
; (RESTART 1) => Return to read-eval-print level 1.

2 bkpt> (debug)

There are 10 subproblems on the stack.

Subproblem level: 0 (this is the lowest subproblem level)
Expression (from stack):
    (begin
     ###
     (call-alien (quote #[alien-function 3 Scm_gtk_label_set_text])
                 label
                 (list->string (reverse! (string->list text)))))
 subproblem being executed (marked by ###):
    (bkpt (quote clicked))
Environment created by a LET special form

 applied to: ("!dlroW ,olleH")
There is no execution history for this subproblem.
You are now in the debugger.  Type q to quit, ? for commands.

3 debug> h
SL#  Procedure-name          Expression

0                            (begin (bkpt (quote clicked)) (call-alien (quo ...
1                            (begin (low-format "; hello::clicked ~S\n" wid ...
2                            (let ((value (thunk))) (set-thread/execution-s ...
3                            (return-to-c)
4                            (c-call-continue (quote #[alien-function 5 Scm ...
5                            (begin (call-alien (quote #[alien-function 6 S ...
6    %repl-eval              (let ((value (hook/repl-eval s-expression envi ...
7    %repl-eval/write        (hook/repl-write (%repl-eval s-expression envi ...
8                            (begin (if (queue-empty? queue) (let ((environ ...
9    loop                    (loop (bind-abort-restart cmdl (lambda () (der ...

3 debug> K
Choose an option by number:
  2: Return from BKPT.
  1: Return to read-eval-print level 1.

Option number (1 through 2 inclusive): 2
@end verbatim

Here is gdb's backtrace from the same point.  It shows the recursive
call @code{Interpret(1)} and the callout via @code{Prim_c_call}.

@verbatim
#0  0xb7f0e410 in __kernel_vsyscall ()
#1  0xb7cf5bcb in poll () from /lib/tls/i686/cmov/libc.so.6
#2  0x0809fb55 in OS_test_select_registry (registry=0x80ed178, blockp=1) at 
uxio.c:486
#3  0x08098c90 in Prim_test_selreg () at prosio.c:309
#4  0x0809502e in primitive_apply_internal (primitive=1610613295) at utils.c:861
#5  0x080b8b32 in comutil_primitive_apply (DSU_result=0xbff24594, 
primitive_raw=1610613295, ignore2=1359640, ignore3=5402588, ignore4=0) at 
cmpint.c:772
#6  0x080a7fb0 in scheme_to_interface_proceed ()
#7  0xbff24594 in ?? ()
#8  0x080b80fb in apply_compiled_procedure () at cmpint.c:436
#9  0x0807cc85 in Interpret (pop_return_p=1) at interp.c:1102
#10 0x080aa8a3 in run_callback (callback_id=2, kernel=0xb7f0901d 
<Scm_kernel_clicked>) at pruxffi.c:762
#11 0xb7f093ab in Scm_clicked (widget=0x8112eb8, ID=0x2) at prhello.c:643
#12 0xb75dbaff in g_cclosure_marshal_VOID__VOID () from 
/usr/lib/libgobject-2.0.so.0
#13 0xb75ce759 in g_closure_invoke () from /usr/lib/libgobject-2.0.so.0

[...25 Gtk frames...]

#39 0xb754e577 in g_main_loop_run () from /usr/lib/libglib-2.0.so.0
#40 0xb789c264 in gtk_main () from /usr/lib/libgtk-x11-2.0.so.0
#41 0xb7f08a2d in Scm_gtk_main () at prhello.c:526
#42 0x080a9c2e in Prim_c_call () at pruxffi.c:491
#43 0x0809502e in primitive_apply_internal (primitive=1610612799) at utils.c:861
#44 0x0807c5c1 in Interpret (pop_return_p=0) at interp.c:1009
#45 0x0806b645 in Do_Enter_Interpreter () at boot.c:301
#46 0x0806b668 in Enter_Interpreter () at boot.c:309
#47 0x0806b631 in start_scheme () at boot.c:295
#48 0x0806ae84 in main (argc=1, argv=0xbff26174) at boot.c:132
@end verbatim

@heading Callback stack requirement.

The first part of a callback tramp. pushes two, zero-arity primitive
apply frames on the stack (8 words).  It cannot GC abort to
get more stack --- it is running ``outside'' of the interpreter.  Thus
it @emph{will} fail if there is no room on the stack.  A warning is
emitted on stderr in that case.
@c TODO!!!
In the future, the required stack space might be guaranteed by the
callout tramps.

@heading Callback alien fixups.

Aliens are normal records with a record type.  However callback
trampolines consing alien (pointer) callback arguments will not bother
to track down the record type's ``dispatch-tag''.  They will simply
return a 3 element vector.  The Scheme callback dispatcher can more
easily munge the vector into a record, and there should be no
mistaking a vector for an alien.  The trampolines do not create Scheme
vectors otherwise.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]