[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gcl-devel] Re: boxing in multiple values

From: Camm Maguire
Subject: [Gcl-devel] Re: boxing in multiple values
Date: 23 Jul 2005 02:58:56 -0400
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2


Matt Kaufmann <address@hidden> writes:

> Hi, Camm --
> Thank you very much for the explanations!  (And certainly no apology is
> necessary; I'm very grateful for all you do for GCL.)  Here are some 
> questions,
> none pressing -- please put them on the bottom of your "to do" list.
> I don't really understand "interfile" vs. "intrafile".  Are we talking here
> about passing values between functions defined in different source files?

Yes -- more unboxing can be done in calls between functions defined in
the same file.  And we can even do a little better between functions
in different files with a few simple modifications.

> Which version of GCL has the much larger immediate fixnum table?

CVS tag 2.7.0t3

export CVS_RSH=ssh
export CVSROOT=:ext:address@hidden:/cvsroot/gcl
cvs -z9 -q co -d gcl-2.7.0t3 -r Version_2_7_0t3 gcl

> How does one define one's own inlining in GCL?

By pushing entries to the symbol-plist indicating which argument and
return types should be inlined, under what conditions, and how.
Examples can be found in cmpnew/gcl_cmpopt.lsp.  Please let me know if
the examples are not sufficiently clear.

> You mention "fork based multi processing which should give good performance on
> problems of moderate granularity".  Can you briefly give a sense of what
> "moderate granularity" means?

In general, all multiple processing has some fixed cost/overhead
associated with starting and stopping the separate job.  When the
intrinsic  computation time required by the job is large compared to
this fixed cost, spawning the parallel job is a win.  The fixed cost
varies according to multiple-processing methodology.  SIMD multiple
arithmetic operations on the cpu probably have the lowest overhead,
then maybe threads, then fork, then cluster computing/mpi, then
distributed computing across the internet, etc.  fork has a little
more overhead than threads, as we'll need a read to get the answer
among other reasons -- thankfully we already have a 'fasd' reader and
Linux comes with copy-on-write pages, both of which will together with
some stack memory allocation we can implement make this quite
serviceable IMHO.  Threads will get there eventually,  but they are
much harder to use in lisp due to its very structure, gbc, special
variables, etc.  The implementations I've seen always require specific
user locking of shared globals, etc.

Take care,

> Thanks again --
> -- Matt
>    Cc: address@hidden, address@hidden, address@hidden,
>          address@hidden, address@hidden, address@hidden
>    From: Camm Maguire <address@hidden>
>    Date: 16 Jul 2005 09:37:14 -0400
>    User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2
>    Content-Type: text/plain; charset=us-ascii
>    X-SpamAssassin-Status: No, hits=-2.6 required=5.0
>    X-UTCS-Spam-Status: No, hits=-232 required=180
>    Greetings!  Please Matt accept my apologis -- I know you have a number
>    of notes in to which I've not yet replied -- rest assured they are in
>    the queue, and once t4 gets out I hope to address them more fully.
>    Bob is right about multiple values in general -- as we rely on C, we
>    can only push c objects onto the C stack.  This isn't such an
>    obstacle, as one can define structures and return the structure body.
>    Perhaps the first word or char could describe the format of the
>    following words, i.e. boxed objects of otherwise.  But some fiddling
>    will be required at call and return time, so it will never be quite as
>    fast as a pure fixnum assignment.
>    GCL defines its own lisp stack for passing arguments in a slower
>    fashion -- these must be pure objects, as there is no way (that I can
>    think of) to reliably designate the format to any routines that will
>    use this stack, e.g. the debugger.  So once you need 'base',
>    'vs_base', or 'vs_top', you must box.
>    There are actually two levels of unboxing in GCL -- interfile and
>    intrafile.  Currently, only lisp objects and unboxed word-sized
>    fixnums can be passed interfile.  These plus characters, short and
>    long floats can be passed intrafile.  The former set can have at most
>    4 elements without a major overhaul of the C code -- I have a running
>    question as to what people might think useful to add to fixnums in
>    this regard.  Float seem the likely choice, but Bob rightly notes that
>    most people caring about floats don't use lisp :-(.
>    Bob is also right about the relative benefit of fixnum unboxing --
>    much of the overhead or a 'boxed' fixnum has been removed with the new
>    (much larger) immediate fixnum table.  It is still not as fast as
>    unboxed fixnum arithmetic and can never be, though I've tried to
>    narrow the gap as much as possible.  'Unboxing' immediate fixnums is
>    basically a subtraction, so at a minimum there is an extra
>    subtraction, addition, and possibly a branch to boxed immediate fixnum
>    arithmetic.  In addition, there may be function calls.  But the lion's
>    share of the overhead came from memory access, a fact which had
>    escaped me until recently, and this is now gone for the immediate
>    fixnum range.  This should also indicate that if we do ever pass
>    multiple values in a structure body, its likely to be a win even with
>    the fiddling at call and return time, as there is no extra memory
>    access to get to the data.  All thanks to Bob Boyer for enlightening
>    me on these key facts and suggesting the solution we're now using.
>    I'd like to finally mention that we have added some impprovements
>    which allow multiple value functions to be inlined.  gethash and floor
>    are examples you might find useful.  You can define you own inlining
>    in GCL, though I imagine the use you have in mind is too big for
>    inlining. 
>    Finally regarding threads -- we have immediate plans for fork based
>    multi processing which should give good performance on problems of
>    moderate granularity given Linux's copy on write pages, and some stack
>    memory allocation tricks we can pull.  True threads will have to wait
>    a bit more.  I haven't done any exhaustive survey, but it appears that
>    any lisp thread implementation must invoke a whole lot of explicit
>    user locking calls etc. for globals -- perhaps this is not the burden
>    it appears to me.  set-mv is an explicit global variable pass from
>    what I can see, and to my limited understanding can therefore never be
>    made reentrant.  Perhaps there is some way to open up some extra MVloc
>    space in the stack space of each thread on thread launch or some
>    such.  It would appear that threads would be much more sane if one can
>    arrange to avoid globals and specials period and emulate same if
>    possible on the C stack per thread.
>    Take care,
>    Matt Kaufmann <address@hidden> writes:
>    > Hi, Camm --
>    > 
>    > We are looking into adding support for parallelism in ACL2, and hence we 
> are
>    > considering replacing our current multiple-value mechanism with Common 
> Lisp
>    > multiple-value-bind and values.  In our existing approach, multiple 
> values are
>    > returned in global variables except for the first value, which is the one
>    > actually returned by the function.  Thus, we can avoid boxing fixnums 
> that are
>    > returned in the first value position.  Unfortunately, GCL seems to box
>    > everything when returning multiple values.  Below is a little example 
> that
>    > illustrates that point (I've tried it in GCL 2.6.6 and some version of 
> 2.7.0,
>    > both CLtL1).
>    > 
>    > >(proclaim (quote (optimize (speed 3) (space 0) (safety 0))))
>    > 
>    > NIL
>    > 
>    > >(proclaim '(ftype (function (fixnum) fixnum t) foo))
>    > 
>    > NIL
>    > 
>    > >(disassemble (defun foo (x)
>    >          (declare (type fixnum x))
>    >          (values (the fixnum (+ x x)) t)))
>    > 
>    > In the resulting code we see:
>    > 
>    >  base[1]= CMPmake_fixnum((long)(V1)+(V1));
>    > 
>    > This boxing goes away if foo returns a single value instead.
>    > 
>    > Do you imagine a future version of GCL that could avoid such boxing, at 
> least
>    > in the first returned value?
>    > 
>    > A completely different solution that would work fine for parallelism 
> could be
>    > to make system::set-mv thread-safe.
>    > 
>    > Thanks --
>    > -- Matt
>    > 
>    > 
>    > 
>    -- 
>    Camm Maguire                                               address@hidden
>    ==========================================================================
>    "The earth is but one country, and mankind its citizens."  --  Baha'u'llah

Camm Maguire                                            address@hidden
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah

reply via email to

[Prev in Thread] Current Thread [Next in Thread]