If on nbd_close() we detach the thread (in
nbd_co_establish_connection_cancel() thr->state becomes
CONNECT_THREAD_RUNNING_DETACHED), after that point we should not use
s->connect_thread (which is set to NULL), as running thread may free it
at any time.
Still nbd_co_establish_connection() does exactly this: it saves
s->connect_thread to local variable (just for better code style) and
use it even after yield point, when thread may be already detached.
Fix that. Also check thr to be non-NULL on
nbd_co_establish_connection() start for safety.
After this patch "case CONNECT_THREAD_RUNNING_DETACHED" becomes
impossible in the second switch in nbd_co_establish_connection().
Still, don't add extra abort() just before the release. If it somehow
possible to reach this "case:" it won't hurt. Anyway, good refactoring
of all this reconnect mess will come soon.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
Hi all! I faced a crash, just running 277 iotest in a loop. I can't
reproduce it on master, it reproduces only on my branch with nbd
reconnect refactorings.
Still, it seems very possible that it may crash under some conditions.
So I propose this patch for 6.0. It's written so that it's obvious that
it will not hurt:
pre-patch, on first hunk we'll just crash if thr is NULL,
on second hunk it's safe to return -1, and using thr when
s->connect_thread is already zeroed is obviously wrong.
block/nbd.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/block/nbd.c b/block/nbd.c
index c26dc5a54f..1d4668d42d 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -443,6 +443,11 @@ nbd_co_establish_connection(BlockDriverState *bs, Error
**errp)
BDRVNBDState *s = bs->opaque;
NBDConnectThread *thr = s->connect_thread;
+ if (!thr) {
+ /* detached */
+ return -1;
+ }
+
qemu_mutex_lock(&thr->mutex);
switch (thr->state) {