[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: NSObject.m threading bug(s).
From: |
David Chisnall |
Subject: |
Re: NSObject.m threading bug(s). |
Date: |
Sat, 11 Sep 2010 00:39:46 +0100 |
On 11 Sep 2010, at 00:21, Chris Ball wrote:
> Whilst I understand what you are saying, my practical experience with our code
> suggests there are subtleties perhaps within GNUstep that make this assumption
> unsafe.
If there are, then the bugs are in other code. Running the clang static
analyser found a few potential user-after-release bugs in GNUstep, so it's
possible.
> Thread 1:
> result = GSAtomicDecrement((gsatomic_t)&(((obj)anObject)[-1].retained));
> where result = -1
The object's last reference has just gone away. There should now be no
pointers to the object. If there are, then it's an application bug.
> now switch to Thread 2:
> result =
> GSAtomicDecrement((gsatomic_t)&(((obj)anObject)[-1].retained));
> if (result < 0)
> {
> if (result != -1)
> {
> [NSException raise: NSInternalInconsistencyException
> format: @"NSDecrementExtraRefCount() decremented too far"];
> }
>
> Death due to exception.
Correct. You have done something like this:
thread 1:
a = [objc new];
...
[a release];
thread 2:
[a release];
If the retain count of an object ever drops below 0, this means that it has
been sent one too many release messages. As I said, the purpose of the
exception is to try to make debugging this easier.
In a single-threaded application, or the same application used a single CPU or
where the threads didn't overlap in this way, then the sequence would be:
thread 1 sends -release.
thread 1 sends handles -dealloc
thread 1 object is destroyed.
thread 2 -release message is sent to invalid address. Program crashes.
This kind of bug is masked by using the GNUstep DESTROY() macro on a global (or
an ivar on an object that is shared between two threads). In this case, you
have something roughly equivalent to this:
[a release];
a = nil;
If two threads do this with a very small gap between them, nothing will go
wrong. The second thread will send -release to nil, where it will be ignored.
If, however, the preemption happens between these two lines, the second thread
will send -release to an object with a retain count of 0.
It is possible to write a lockless, thread-safe, version of DESTROY(), but you
can not use the GNUstep one (or something equivalent) in this way.
> Other variations of course exist, ie thread one can get through the first two
> if's and switch to the other thread just after resetting the count to zero,
> now
> thread one will return YES as will thread 2.
Again, only possible if there is a bug outside this code. This code is not
intended to catch all possible bugs in user code, just the small subset that it
is possible to catch.
> Further exercises are left to the reader.
>
> If your contention is that only one thread is managing the retain count and
> releasing an object, then you don't need an atomic decrement at all but that
> clearly isn't true given that any thread can do retain/release an otherwise
> autoreleased object.
That is not my contention at all.
> I should point out that this isn't academic, we have a home rolled database
> that has used GNUstep since around 2002 and processes between 500 million and
> a
> billion messages per day and before I disabled this GSATOMIC stuff we would
> process tops around 30 million messages before the system fell over in one
> weird way or another (this on a 32 core machine). In our application the
> place
> I most consistently found the code to fail is where we are transferring
> messages
> via a queue from one thread to another so we have an autorelease message
> created in thread 1, which is then retained for transfer to the queue (it is
> retained again inside the queue because the queue itself is threadsafe) then
> the other thread pulls the message from the queue (queue releases the message)
> we do stuff with it and then call release (at which point we don't care about
> the message anymore) and presumably thread one reaps the autoreleased memory.
Sounds like you have an issue where the object is already over-autoreleased.
Try running the clang static analyser on your code.
> Now I'm not proud, this system all told, is around 300k lines of code, of
> course
> it has bugs, having said this the program doesn't leak the queued messages (we
> don't have terrabytes of ram so we would notice) and I would expect wild
> double
> releases in our code to have at least caused a few otherwise unexplained
> application crashes over the years given that the place that fails is called
> once per message, actually I wouldn't expect the system to last more than a
> minute or two tops given the memory churn that the system goes through.
>
> We played with a couple simple gnustep programs using the Intel thread checker
> tool (http://software.intel.com/en-us/intel-thread-checker/) and disabling
> GSATOMIC stopped quite a few strange errors in the GNUStep internals from
> occurring for what it's worth.
Disabling GSATOMIC will put you on a very slow path, where every retain or
release involves acquiring and releasing a lock. It will make a lot of bugs
disappear simply because it will enforce a lot of synchronisation on your code.
Every retain / release message will become a sync point.
>>> Thank goodness for the NSInteralInconsistencyException because this would
>>> have
>>> been a bear to find otherwise. Although very random/crashy behaviour seemed
>>> about as likely as getting the exception.
>>
>> The exception is there for debugging. We don't guarantee catching
>> double-free bugs using this mechanism, because if it happened in a
>> non-concurrent system we'd already have free'd the memory that the retain
>> count is stored in, giving a segfault.
>
> There is no double free bugs that I'm aware of and since the actual memory
> release happens outside of the locks in the non-GSATOMIC version of the code
> double frees could certainly still cause problems if they are present.
There are no double-free bugs in any code that people are aware of. They are
trivial to fix, once you know where they are...
David
-- Sent from my brain