guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: RFC: (ice-9 sandbox)


From: Andy Wingo
Subject: Re: RFC: (ice-9 sandbox)
Date: Tue, 18 Apr 2017 21:48:00 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux)

On Fri 31 Mar 2017 11:27, Andy Wingo <address@hidden> writes:

> Attached is a module that can evaluate an expression within a sandbox.

Pushed to master.  See NEWS here, where I include a couple more entries
of note:

    * Notable changes

    ** New sandboxed evaluation facility

    Guile now has a way to execute untrusted code in a safe way.  See
    "Sandboxed Evaluation" in the manual for full details, including some
    important notes on limitations on the sandbox's ability to prevent
    resource exhaustion.

    ** All literal constants are read-only

    According to the Scheme language definition, it is an error to attempt
    to mutate a "constant literal".  A constant literal is data that is a
    literal quoted part of a program.  For example, all of these are errors:

      (set-car! '(1 . 2) 42)
      (append! '(1 2 3) '(4 5 6))
      (vector-set! '#(a b c) 1 'B)

    Guile takes advantage of this provision of Scheme to deduplicate shared
    structure in constant literals within a compilation unit, and to
    allocate constant data directly in the compiled object file.  If the
    data needs no relocation at run-time, as is the case for pairs or
    vectors that only contain immediate values, then the data can actually
    be shared between different Guile processes, using the operating
    system's virtual memory facilities.

    However, in Guile 2.2.0, constants that needed relocation were actually
    mutable -- though (vector-set! '#(a b c) 1 'B) was an error, Guile
    wouldn't actually cause an exception to be raised, silently allowing the
    mutation.  This could affect future users of this constant, or indeed of
    any constant in the compilation unit that shared structure with the
    original vector.

    Additionally, attempting to mutate constant literals mapped in the
    read-only section of files would actually cause a segmentation fault, as
    the operating system prohibits writes to read-only memory.  "Don't do
    that" isn't a very nice solution :)

    Both of these problems have been fixed.  Any attempt to mutate a
    constant literal will now raise an exception, whether the constant needs
    relocation or not.

    ** Syntax objects are now a distinct type

    It used to be that syntax objects were represented as a tagged vector.
    These values could be forged by users to break scoping abstractions,
    preventing the implementation of sandboxing facilities in Guile.  We are
    as embarrassed about the previous situation as we pleased are about the
    fact that we've fixed it.

    Unfortunately, during the 2.2 stable series (or at least during part of
    it), we need to support files compiled with Guile 2.2.0.  These files
    may contain macros that contain legacy syntax object constants.  See the
    discussion of "allow-legacy-syntax-objects?" in "Syntax Transformer
    Helpers" in the manual for full details.

And the documentation formatted as text is below.  I guess a 2.2.1 is
coming soon.  Thanks all for the review!

Andy



1.12 Sandboxed Evaluation
-------------------------

Sometimes you would like to evaluate code that comes from an untrusted
party.  The safest way to do this is to buy a new computer, evaluate the
code on that computer, then throw the machine away.  However if you are
unwilling to take this simple approach, Guile does include a limited
"sandbox" facility that can allow untrusted code to be evaluated with
some confidence.

   To use the sandboxed evaluator, load its module:

     (use-modules (ice-9 sandbox))

   Guile's sandboxing facility starts with the ability to restrict the
time and space used by a piece of code.

 -- Scheme Procedure: call-with-time-limit limit thunk limit-reached
     Call THUNK, but cancel it if LIMIT seconds of wall-clock time have
     elapsed.  If the computation is cancelled, call LIMIT-REACHED in
     tail position.  THUNK must not disable interrupts or prevent an
     abort via a 'dynamic-wind' unwind handler.

 -- Scheme Procedure: call-with-allocation-limit limit thunk
          limit-reached
     Call THUNK, but cancel it if LIMIT bytes have been allocated.  If
     the computation is cancelled, call LIMIT-REACHED in tail position.
     THUNK must not disable interrupts or prevent an abort via a
     'dynamic-wind' unwind handler.

     This limit applies to both stack and heap allocation.  The
     computation will not be aborted before LIMIT bytes have been
     allocated, but for the heap allocation limit, the check may be
     postponed until the next garbage collection.

     Note that as a current shortcoming, the heap size limit applies to
     all threads; concurrent allocation by other unrelated threads
     counts towards the allocation limit.

 -- Scheme Procedure: call-with-time-and-allocation-limits time-limit
          allocation-limit thunk
     Invoke THUNK in a dynamic extent in which its execution is limited
     to TIME-LIMIT seconds of wall-clock time, and its allocation to
     ALLOCATION-LIMIT bytes.  THUNK must not disable interrupts or
     prevent an abort via a 'dynamic-wind' unwind handler.

     If successful, return all values produced by invoking THUNK.  Any
     uncaught exception thrown by the thunk will propagate out.  If the
     time or allocation limit is exceeded, an exception will be thrown
     to the 'limit-exceeded' key.

   The time limit and stack limit are both very precise, but the heap
limit only gets checked asynchronously, after a garbage collection.  In
particular, if the heap is already very large, the number of allocated
bytes between garbage collections will be large, and therefore the
precision of the check is reduced.

   Additionally, due to the mechanism used by the allocation limit (the
'after-gc-hook'), large single allocations like '(make-vector #e1e7)'
are only detected after the allocation completes, even if the allocation
itself causes garbage collection.  It's possible therefore for user code
to not only exceed the allocation limit set, but also to exhaust all
available memory, causing out-of-memory conditions at any allocation
site.  Failure to allocate memory in Guile itself should be safe and
cause an exception to be thrown, but most systems are not designed to
handle 'malloc' failures.  An allocation failure may therefore exercise
unexpected code paths in your system, so it is a weakness of the sandbox
(and therefore an interesting point of attack).

   The main sandbox interface is 'eval-in-sandbox'.

 -- Scheme Procedure: eval-in-sandbox exp [#:time-limit 0.1]
          [#:allocation-limit #e10e6] [#:bindings all-pure-bindings]
          [#:module (make-sandbox-module bindings)] [#:sever-module? #t]
     Evaluate the Scheme expression EXP within an isolated "sandbox".
     Limit its execution to TIME-LIMIT seconds of wall-clock time, and
     limit its allocation to ALLOCATION-LIMIT bytes.

     The evaluation will occur in MODULE, which defaults to the result
     of calling 'make-sandbox-module' on BINDINGS, which itself defaults
     to 'all-pure-bindings'.  This is the core of the sandbox: creating
     a scope for the expression that is "safe".

     A safe sandbox module has two characteristics.  Firstly, it will
     not allow the expression being evaluated to avoid being cancelled
     due to time or allocation limits.  This ensures that the expression
     terminates in a timely fashion.

     Secondly, a safe sandbox module will prevent the evaluation from
     receiving information from previous evaluations, or from affecting
     future evaluations.  All combinations of binding sets exported by
     '(ice-9 sandbox)' form safe sandbox modules.

     The BINDINGS should be given as a list of import sets.  One import
     set is a list whose car names an interface, like '(ice-9 q)', and
     whose cdr is a list of imports.  An import is either a bare symbol
     or a pair of '(OUT . IN)', where OUT and IN are both symbols and
     denote the name under which a binding is exported from the module,
     and the name under which to make the binding available,
     respectively.  Note that BINDINGS is only used as an input to the
     default initializer for the MODULE argument; if you pass
     '#:module', BINDINGS is unused.  If SEVER-MODULE? is true (the
     default), the module will be unlinked from the global module tree
     after the evaluation returns, to allow MOD to be garbage-collected.

     If successful, return all values produced by EXP.  Any uncaught
     exception thrown by the expression will propagate out.  If the time
     or allocation limit is exceeded, an exception will be thrown to the
     'limit-exceeded' key.

   Constructing a safe sandbox module is tricky in general.  Guile
defines an easy way to construct safe modules from predefined sets of
bindings.  Before getting to that interface, here are some general notes
on safety.

  1. The time and allocation limits rely on the ability to interrupt and
     cancel a computation.  For this reason, no binding included in a
     sandbox module should be able to indefinitely postpone interrupt
     handling, nor should a binding be able to prevent an abort.  In
     practice this second consideration means that 'dynamic-wind' should
     not be included in any binding set.
  2. The time and allocation limits apply only to the 'eval-in-sandbox'
     call.  If the call returns a procedure which is later called, no
     limit is "automatically" in place.  Users of 'eval-in-sandbox' have
     to be very careful to reimpose limits when calling procedures that
     escape from sandboxes.
  3. Similarly, the dynamic environment of the 'eval-in-sandbox' call is
     not necessarily in place when any procedure that escapes from the
     sandbox is later called.

     This detail prevents us from exposing 'primitive-eval' to the
     sandbox, for two reasons.  The first is that it's possible for
     legacy code to forge references to any binding, if the
     'allow-legacy-syntax-objects?' parameter is true.  The default for
     this parameter is true; *note Syntax Transformer Helpers:: for the
     details.  The parameter is bound to '#f' for the duration of the
     'eval-in-sandbox' call itself, but that will not be in place during
     calls to escaped procedures.

     The second reason we don't expose 'primitive-eval' is that
     'primitive-eval' implicitly works in the current module, which for
     an escaped procedure will probably be different than the module
     that is current for the 'eval-in-sandbox' call itself.

     The common denominator here is that if an interface exposed to the
     sandbox relies on dynamic environments, it is easy to mistakenly
     grant the sandboxed procedure additional capabilities in the form
     of bindings that it should not have access to.  For this reason,
     the default sets of predefined bindings do not depend on any
     dynamically scoped value.
  4. Mutation may allow a sandboxed evaluation to break some invariant
     in users of data supplied to it.  A lot of code culturally doesn't
     expect mutation, but if you hand mutable data to a sandboxed
     evaluation and you also grant mutating capabilities to that
     evaluation, then the sandboxed code may indeed mutate that data.
     The default set of bindings to the sandbox do not include any
     mutating primitives.

     Relatedly, 'set!' may allow a sandbox to mutate a primitive,
     invalidating many system-wide invariants.  Guile is currently quite
     permissive when it comes to imported bindings and mutability.
     Although 'set!' to a module-local or lexically bound variable would
     be fine, we don't currently have an easy way to disallow 'set!' to
     an imported binding, so currently no binding set includes 'set!'.
  5. Mutation may allow a sandboxed evaluation to keep state, or make a
     communication mechanism with other code.  On the one hand this
     sounds cool, but on the other hand maybe this is part of your
     threat model.  Again, the default set of bindings doesn't include
     mutating primitives, preventing sandboxed evaluations from keeping
     state.
  6. The sandbox should probably not be able to open a network
     connection, or write to a file, or open a file from disk.  The
     default binding set includes no interaction with the operating
     system.

   If you, dear reader, find the above discussion interesting, you will
enjoy Jonathan Rees' dissertation, "A Security Kernel Based on the
Lambda Calculus".

 -- Scheme Variable: all-pure-bindings
     All "pure" bindings that together form a safe subset of those
     bindings available by default to Guile user code.

 -- Scheme Variable: all-pure-and-impure-bindings
     Like 'all-pure-bindings', but additionally including mutating
     primitives like 'vector-set!'.  This set is still safe in the sense
     mentioned above, with the caveats about mutation.

   The components of these composite sets are as follows:
 -- Scheme Variable: alist-bindings
 -- Scheme Variable: array-bindings
 -- Scheme Variable: bit-bindings
 -- Scheme Variable: bitvector-bindings
 -- Scheme Variable: char-bindings
 -- Scheme Variable: char-set-bindings
 -- Scheme Variable: clock-bindings
 -- Scheme Variable: core-bindings
 -- Scheme Variable: error-bindings
 -- Scheme Variable: fluid-bindings
 -- Scheme Variable: hash-bindings
 -- Scheme Variable: iteration-bindings
 -- Scheme Variable: keyword-bindings
 -- Scheme Variable: list-bindings
 -- Scheme Variable: macro-bindings
 -- Scheme Variable: nil-bindings
 -- Scheme Variable: number-bindings
 -- Scheme Variable: pair-bindings
 -- Scheme Variable: predicate-bindings
 -- Scheme Variable: procedure-bindings
 -- Scheme Variable: promise-bindings
 -- Scheme Variable: prompt-bindings
 -- Scheme Variable: regexp-bindings
 -- Scheme Variable: sort-bindings
 -- Scheme Variable: srfi-4-bindings
 -- Scheme Variable: string-bindings
 -- Scheme Variable: symbol-bindings
 -- Scheme Variable: unspecified-bindings
 -- Scheme Variable: variable-bindings
 -- Scheme Variable: vector-bindings
 -- Scheme Variable: version-bindings
     The components of 'all-pure-bindings'.

 -- Scheme Variable: mutating-alist-bindings
 -- Scheme Variable: mutating-array-bindings
 -- Scheme Variable: mutating-bitvector-bindings
 -- Scheme Variable: mutating-fluid-bindings
 -- Scheme Variable: mutating-hash-bindings
 -- Scheme Variable: mutating-list-bindings
 -- Scheme Variable: mutating-pair-bindings
 -- Scheme Variable: mutating-sort-bindings
 -- Scheme Variable: mutating-srfi-4-bindings
 -- Scheme Variable: mutating-string-bindings
 -- Scheme Variable: mutating-variable-bindings
 -- Scheme Variable: mutating-vector-bindings
     The additional components of 'all-pure-and-impure-bindings'.

   Finally, what do you do with a binding set?  What is a binding set
anyway?  'make-sandbox-module' is here for you.

 -- Scheme Procedure: make-sandbox-module bindings
     Return a fresh module that only contains BINDINGS.

     The BINDINGS should be given as a list of import sets.  One import
     set is a list whose car names an interface, like '(ice-9 q)', and
     whose cdr is a list of imports.  An import is either a bare symbol
     or a pair of '(OUT . IN)', where OUT and IN are both symbols and
     denote the name under which a binding is exported from the module,
     and the name under which to make the binding available,
     respectively.

   So you see that binding sets are just lists, and
'all-pure-and-impure-bindings' is really just the result of appending
all of the component binding sets.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]