qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC] nbd: decouple reconnect from drain


From: Vladimir Sementsov-Ogievskiy
Subject: Re: [RFC] nbd: decouple reconnect from drain
Date: Fri, 12 Mar 2021 15:35:25 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1

10.03.2021 12:32, Roman Kagan wrote:
NBD connect coroutine takes an extra in_flight reference as if it's a
request handler.  This prevents drain from completion until the
connection coroutine is releases the reference.

When NBD is configured to reconnect, however, this appears to be fatal
to the reconnection idea: the drain procedure wants the reconnection to
be suspended, but this is only possible if the in-flight requests are
canceled.

As I remember from our conversation, the problem is not that we don't reconnect 
during drained section, but exactly that we cancel requests on drained begins 
starting from 8c517de24a8a1dcbeb.

This is not a problem in scenarios when reconnect is rare case and failed request is 
acceptable. But if we have bad connection and requests should often wait for reconnect 
(so, it may be considered as a kind of "latency") then really, cancelling the 
waiting requests on any drain() kills the reconnect feature.


Fix this by making the connection coroutine stop messing with the
in-flight counter.  Instead, certain care is taken to properly move the
reconnection stuff from one aio_context to another in
.bdrv_{attach,detach}_aio_context callbacks.

Fixes: 5ad81b4946 ("nbd: Restrict connection_co reentrance")
Signed-off-by: Roman Kagan <rvkagan@yandex-team.ru>
---
This patch passes the regular make check but fails some extra iotests,
in particular 277.  It obviously lacks more robust interaction with the
connection thread (which in general is fairly complex and hard to reason
about), and perhaps has some other drawbacks, so I'll work on this
further, but I'd appreciate some feedback on whether the idea is sound.


In general I like the idea. The logic around drain in nbd is overcomplicated. 
And I never liked the fact that nbd_read_eof() does dec-inc inflight section. 
Some notes:

1. I hope, the patch can be divided into several ones, as there are several 
things done:

- removing use of in_flight counter introduced by 5ad81b4946
- do reconnect during drained section
- stop cancelling requests on .drained_begin

2. 5ad81b4946 was needed to make nbd_client_attach_aio_context() reenter 
connection_co only in one (or two) possible places, not on any yield.. And I 
don't see how it is achieved now.. This should be described in commit msg..

3. About cancelling requests on drained_begin. The behavior was introduced by 
8c517de24a8a1dcbeb, to fix a deadlock. So, if now the deadlock is fixed another 
way, let's change the logic (don't cancel requests) in a separate patch, and 
note 8c517de24a8a1dcbeb commit and the commit that fixes deadlock the other way 
in the commit message.


--
Best regards,
Vladimir



reply via email to

[Prev in Thread] Current Thread [Next in Thread]