guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Race condition in threading code?


From: Julian Graham
Subject: Re: Race condition in threading code?
Date: Sat, 30 Aug 2008 19:05:17 -0400

Okay, I think I know what the problem is: Part of the SRFI-18 thread
start / creation process involves contention for a mutex, and there's
a bug in fat_mutex_lock code that causes the locking thread to
sometimes miss an unlocking thread's notification that a mutex is
available.  So it's actually a mutex bug -- specifically, in the loop
code in fat_mutex_lock that ends with the following snippet:

      ...
          scm_i_pthread_mutex_unlock (&m->lock);
          SCM_TICK;
          scm_i_scm_pthread_mutex_lock (&m->lock);
        }
      block_self (m->waiting, mutex, &m->lock, timeout);

...which means that if the loop is entered while the mutex is still
locked but the owner unlocks it after the locking thread releases the
administrative lock to run the tick, the locking thread will sleep
forever because it doesn't re-check the state of the mutex.  I've made
a small change (blocking before doing the tick instead of after) that
seems to resolve the issue (so far no lock-ups using Han-Wen's x.test
for a couple of hours).  There's a patch attached.

(Sorry, should have noticed this earlier; the problem existed before
the changes I introduced to support SRFI-18...)


Regards,
Julian


On Wed, Aug 27, 2008 at 9:14 AM, Julian Graham <address@hidden> wrote:
>> I've seen `srfi-18.test' hang from time to time, but not often enough to
>> give me an incentive to nail it down.  :-(  I don't think it relates to
>> Han-Wen's GC changes.
>
>
> Crap, I'm seeing some lockups now, too.  Sorry, guys.  I'm debugging,
> but don't let that stop you from investigating as well.  ;)

Attachment: 0001-Resolve-a-deadlock-caused-by-not-checking-mutex-stat.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]