threading issues in 1.8?

guile-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

threading issues in 1.8?

From:	Ken Raeburn
Subject:	threading issues in 1.8?
Date:	Wed, 1 Mar 2006 00:17:57 -0500

Hi. I've been starting to look at the 1.8 code, and I'm concernedthat there may still be thread safety issues.

1) See my old mail about scm_leave_guile -- in fact, it's now in acomment in threads.c. :-) We're still calling setjmp, returning, andthen calling the user's function. I think setjmp and the user'sfunction need to be called from the same function, or at least thefunction calling setjmp should not return before the user's functionis called. (And possibly any args and local variables should be"volatile".)

2) Access to newly created objects: I've been trying to figure out...if one thread allocates a new object (cons cell, say, or vector, orsmob), and does a set-car! or something to make it accessible toanother thread running on another processor, and the systemimplements a weak memory order model (so memory writes done withoutsynchronization primitives can be seen as happening out of order),what's to cause the contents of the new pair to get written beforethe second thread can examine them?

Sure, we've more or less adopted the rule that withoutsynchronization, one processor may see the "before" or "after"results of another thread's modifications, or two threads modifyingan object may cause it to end up with some unpredictable but validvalue. But does that work in the case of a free cell being turnedinto a cons cell or smob, or random heap storage being turned into avector?

I've been doing a little reading. Not enough to have any answersyet, but enough to see that Java ran into similar issues. BillPugh's paper, "Implementing OO Languages under a Weak MemoryOrder" (http://www.cs.umd.edu/~pugh/java/memoryModel/weak.pdf)discusses much the same issue as was bothering me in Guile -- objectinitialization in multiprocessor systems with weak memory ordering,particularly on the Alpha. (From other stuff I've read, Alpha seemsto provide the weakest coherency model of modern processors, so if wewant a robust solution, it pretty much needs to target the Alpha.)His paper is largely about Java, but it seems to me that Guile needsa more precise memory model as well, if it's to be multiprocessor-multithreaded and robust. Unfortunately, I don't follow the Javaworld closely, but it looks like the revision of the Java memorymodel and thread spec is worth reading up on. I'm still looking....

Ken

P.S. For those not familiar with the Alpha, a tiny bit ofbackground. Digital made some architectural design decisionsintended to allow for aggressive performance work. One of them wasthe weak memory coherency model. The processor has "memory barrier"instructions which can force delayed writes to be performed, or causechanges to main memory by other processors to be seen; naturally,these are somewhat expensive, so you don't want to do them all thetime. But without the memory barrier, this sort of situation is*explicitly* permitted in the architecture reference manual:


  initial state: X=1, Y=1
  processor I instruction stream: write X=2, write Y=2
  processor J instruction stream: read Y => 2, read X => 1

Even if one processor (not both) uses a memory barrier instruction inbetween the two operations, the reordered results are allowed.Without a barrier, processor I can perform the second write first,and without a barrier, processor J can perform the second read first.

Another interesting change in early Alpha processors was the absenceof byte memory I/O operations. (Later processors, since about theEV56, have added them in as an extension.) If you wanted to modify abyte, you'd read the containing 32- or 64-bit word, modify it, andwrite it back; likewise for 16-bit shorts.

There are instructions to provide coordinated changes to specificlocations. The "load locked" instructions load a word from storageand "locks" the address; the "store conditional" instruction stores avalue if the lock is still set, and sets a flag indicating whether itworked. The lock flag can be reset by various conditions, includingother reads or writes from the same processor (I think?), interrupts(including system clock), branches, and writes by other processors tothe same location (or within a 2**N block containing it, where theblock size is architecture-dependent, but must be at least 16 and nomore than one page). If the other processor is using theseinstructions too, both processors can try to modify a location, butonly one (at most) will succeed.

[Prev in Thread]

Current Thread

[Next in Thread]

threading issues in 1.8?, Ken Raeburn <=

Next by Date: [gnu.org #276136] Does Guile still need to be listed on help-wanted?
Next by thread: [gnu.org #276136] Does Guile still need to be listed on help-wanted?
Index(es):
- Date
- Thread