guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

SCSH process forms and the signal delivery thread


From: Derek Upham
Subject: SCSH process forms and the signal delivery thread
Date: Sun, 26 Mar 2017 09:47:00 -0700
User-agent: mu4e 0.9.17; emacs 25.1.1

I'm working on an implementation of SCSH-style "process forms" for Guile, and 
I'm noticing occasional hangs.  I think I have an understanding of root cause, 
and I'd like people to double-check my analysis.

My code forks its process using the "primitive-fork" function.  The function's 
return value indicates whether the current process is the parent or the child 
process.  The parent and child have user-level data that start out identical 
but can vary independently thereafter: stacks and heaps.  The parent and child 
have kernel-level data that are shared: file descriptors, and (crucially) 
mutexes.  All we can do to stop sharing the kernel-level data is to drop our 
handles to the data.

The BDW-GC implementation is configured to be thread safe, in case Guile runs 
multiple threads.  Therefore per <http://www.hboehm.info/gc/scale.html>:

  "It causes the collector to acquire a lock around essentially all allocation 
and garbage collection activity."

That means after the child process spawns, there is one kernel mutex 
controlling access to two heaps in two separate processes.  If the child 
process needs to do work in the GC layer, it blocks: the signal delivery thread 
in the parent is holding the mutex, and will hold the mutex until it gets some 
data on its reporting pipe.  This happens when a race condition ends up in the 
wrong order.

Based on this comment from scm_fork() I should be seeing a warning when I fork 
with a running thread:

  scm_i_finalizer_pre_fork ();
  if (scm_ilength (scm_all_threads ()) != 1)
    /* Other threads may be holding on to resources that Guile needs --
       it is not safe to permit one thread to fork while others are
       running.

       In addition, POSIX clearly specifies that if a multi-threaded
       program forks, the child must only call functions that are
       async-signal-safe.  We can't guarantee that in general.  The best
       we can do is to allow forking only very early, before any call to
       sigaction spawns the signal-handling thread.  */
    scm_display
      (scm_from_latin1_string
       ("warning: call to primitive-fork while multiple threads are running;\n"
        "         further behavior unspecified.  See \"Processes\" in the\n"
        "         manual, for more information.\n"),
       scm_current_warning_port ());

(This is all Guile 2.2 code.)  The call to scm_i_finalizer_pre_fork() killed 
off the finalization thread, so we're safe there:

  void
  scm_i_finalizer_pre_fork (void)
  {
  #if SCM_USE_PTHREAD_THREADS
    if (automatic_finalization_p)
      {
        stop_finalization_thread ();
        GC_set_finalizer_notifier (spawn_finalizer_thread);
      }
  #endif

But nothing stops the signal delivery thread.  In fact, scm_all_threads() 
explicitly skips the signal delivery thread; we don't get a warning:

  {
    /* We can not allocate while holding the thread_admin_mutex because
       of the way GC is done.
    */
    int n = thread_count;
    scm_i_thread *t;
    SCM list = scm_c_make_list (n, SCM_UNSPECIFIED), *l;

    scm_i_pthread_mutex_lock (&thread_admin_mutex);
    l = &list;
    for (t = all_threads; t && n > 0; t = t->next_thread)
      {
        if (t != scm_i_signal_delivery_thread)
          {
            SCM_SETCAR (*l, t->handle);
            l = SCM_CDRLOC (*l);
          }
        n--;
      }
    *l = SCM_EOL;
    scm_i_pthread_mutex_unlock (&thread_admin_mutex);
    return list;
  }

The signal delivery thread is running in order to support SCSH's "early" 
auto-reap policy, triggered by SIGCHLD.  The alternative is the "late" policy, 
which triggers after garbage collections.  That's not good for parents that do 
lots of spawning but very little garbage generation compared to their heap 
size.  They end up with lots of zombies.

One solution to support the "early" policy might be to tweak scm_fork() so it:

1. Blocks signals.
2. Records the current custom handlers.
3. Resets all handlers.
4. Kills the signal delivery thread.
5. Forks.
6. Starts the signal delivery thread in parent and child.
7. Re-loads the custom handlers in parent and child.
8. Unblocks signals.

Does anyone have other possibilities?

I don't think there's a safe, general solution for running "identical" 
finalizers in the parent and the child, so shutting down the finalizer in the 
child is the best we can do.  Is it worth restarting just the parent's 
finalizer thread after forking?

Other, independent, cleanup opportunities:

- The docs for "primitive-fork" need to mention that calling "primitive-fork" 
shuts down finalizers for the parent and the child.

- Calling “restore-signals” should stop any running signal delivery thread, to 
bring Guile back to a consistent state.

Thanks,

Derek

-- 
Derek Upham
address@hidden



reply via email to

[Prev in Thread] Current Thread [Next in Thread]