guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Cygwin port of Guile 2.2


From: Derek Upham
Subject: Re: Cygwin port of Guile 2.2
Date: Wed, 03 May 2017 07:21:32 -0700
User-agent: mu4e 0.9.17; emacs 25.1.1

Andy Wingo <address@hidden> writes:
> I think there's an argument that a thread doesn't "terminate" until its
> thread-local key destructors have finished running, and therefore
> pthread_join doesn't return until after the key destructors have run.
> This is my understanding of what happens from reading NPTL.  Do I
> understand correctly that you are on Cygwin?  Could it be a cygwin
> pthreads incompatibility?

GNU/Linux, Debian unstable, 4.9.x kernel.  I can’t get too fancy with the 
stderr tracing in a multi-threaded environment, but I think I can demonstrate 
the delay.

Here’s a successful run in GDB:

  Catchpoint 1 (load)
  [Thread debugging using libthread_db enabled]
  Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
  Breakpoint 2 at 0x7ffff7b7937c: file posix.c, line 1239.
  [New Thread 0x7ffff6197700 (LWP 2225)]
  [New Thread 0x7ffff5996700 (LWP 2226)]
  [New Thread 0x7ffff5195700 (LWP 2227)]
  [New Thread 0x7ffff44ff700 (LWP 2228)]
  [New Thread 0x7ffff39fe700 (LWP 2229)]
  checking device 
"/dev/disk/by-id/ata-WDC_WD1001FALS-00E8B0_WD-WMATV4103981-part1"
  scm_join_thread start
  on_thread_exit start
  scm_join_thread end
  scm_fork #threads (start) = 2
  on_thread_exit end
  on_thread_exit start
  on_thread_exit end
  scm_fork #threads (end) = 1
  [Thread 0x7ffff39fe700 (LWP 2229) exited]
  [Thread 0x7ffff44ff700 (LWP 2228) exited]
  got timestamp string: "1493715602" 
"/dev/disk/by-id/ata-WDC_WD1001FALS-00E8B0_WD-WMATV4103981-part1"
  using device: 
"/dev/disk/by-id/ata-WDC_WD1001FALS-00E8B0_WD-WMATV4103981-part1"
  [Thread 0x7ffff5195700 (LWP 2227) exited]
  [Thread 0x7ffff6197700 (LWP 2225) exited]
  [Thread 0x7ffff7fcb740 (LWP 2218) exited]
  [Inferior 1 (process 2218) exited normally]

The scm_join_thread function starts, then on_thread_exit starts, then 
scm_join_thread ends.  (The underlying join operation completed.)  Notice that 
we have already gotten into scm_fork before the cleanup finishes.  Those are 
all for the signal delivery thread.  Between the scm_fork start and end traces, 
we run the pre-fork finalizer cleanup, which causes the second on_thread_exit 
pair.

Unsuccessful run:

  Catchpoint 1 (load)
  [Thread debugging using libthread_db enabled]
  Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
  Breakpoint 2 at 0x7ffff7b7937c: file posix.c, line 1239.
  [New Thread 0x7ffff6197700 (LWP 2239)]
  [New Thread 0x7ffff5996700 (LWP 2240)]
  [New Thread 0x7ffff5195700 (LWP 2241)]
  [New Thread 0x7ffff44ff700 (LWP 2242)]
  [New Thread 0x7ffff39fe700 (LWP 2243)]
  checking device 
"/dev/disk/by-id/ata-WDC_WD1001FALS-00E8B0_WD-WMATV4103981-part1"
  scm_join_thread start
  scm_join_thread end
  scm_fork #threads (start) = 3
  on_thread_exit start
  on_thread_exit end
  scm_fork #threads (end) = 2
  [Thread 0x7ffff44ff700 (LWP 2242) exited]
  on_thread_exit start
  on_thread_exit end
  [Thread 0x7ffff39fe700 (LWP 2243) exited]

  Thread 1 "guile" hit Breakpoint 2, scm_fork () at posix.c:1239
  1239          scm_display

We join on the signal delivery thread and then set scm_i_signal_delivery_thread 
to NULL.  But we reach the start of scm_fork without any call to 
on_thread_exit.  The finalizer thread exits and we clean it up.  At that point 
scm_all_threads reports three threads, because (a) nothing has cleaned up the 
active thread list, and (b) none of the threads in the list matches 
scm_i_signal_delivery_thread (which is NULL).  We reach the scm_fork warning 
block, where I have a breakpoint.  The signal delivery thread cleanup finally 
happens between the breakpoint detection and the prompt.

Another successful run:

  scm_join_thread start
  on_thread_exit start
  scm_join_thread end
  on_thread_exit end
  scm_fork #threads (start) = 2
  on_thread_exit start
  on_thread_exit end
  scm_fork #threads (end) = 1
  [Thread 0x7ffff39fe700 (LWP 2212) exited]
  [Thread 0x7ffff44ff700 (LWP 2211) exited]

We join before on_thread_exit completes, but on_thread_exit completes before we 
start forking.

Another successful run:

  scm_join_thread start
  on_thread_exit start
  on_thread_exit end
  scm_join_thread end
  scm_fork #threads (start) = 2
  [Thread 0x7ffff39fe700 (LWP 2198) exited]
  on_thread_exit start
  on_thread_exit end
  scm_fork #threads (end) = 1
  [Thread 0x7ffff44ff700 (LWP 2197) exited]

on_thread_exit completes before we join.  (This is the happy case.)

Two things I just noticed:

1. Even in the successful cases there are delays between on_thread_exit 
completing and GDB reporting that the thread has exited.  Look at the second 
successful run (third in the overall list).  That’s a huge time gap.  It’s 
circumstantial evidence, but it’s suggestive.

2. In the first successful case (first it the overall list), the finalizer 
cleanup doesn’t start until the signal delivery thread cleanup completes.  
There’s nothing at the code level enforcing that.  I wonder whether there is a 
single, hidden, background thread that coordinates the cleanup.  That’s pure 
speculation.

Derek

-- 
Derek Upham
address@hidden



reply via email to

[Prev in Thread] Current Thread [Next in Thread]