[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 1/2] vhost-user: fix lost reconnect
From: |
Li Feng |
Subject: |
Re: [PATCH 1/2] vhost-user: fix lost reconnect |
Date: |
Thu, 17 Aug 2023 14:40:24 +0800 |
> 2023年8月14日 下午8:11,Raphael Norwitz <raphael.norwitz@nutanix.com> 写道:
>
> Why can’t we rather fix this by adding a “event_cb” param to
> vhost_user_async_close and then call qemu_chr_fe_set_handlers in
> vhost_user_async_close_bh()?
>
> Even if calling vhost_dev_cleanup() twice is safe today I worry future
> changes may easily stumble over the reconnect case and introduce crashes or
> double frees.
>
I think add a new event_cb is not good enough. ‘qemu_chr_fe_set_handlers’ has
been called in vhost_user_async_close, and will be called in event->cb, so why
need add a
new event_cb?
For avoiding to call the vhost_dev_cleanup() twice, add a ‘inited’ in struct
vhost-dev to mark if it’s inited like this:
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index e2f6ffb446..edc80c0231 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1502,6 +1502,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
goto fail_busyloop;
}
+ hdev->inited = true;
return 0;
fail_busyloop:
@@ -1520,6 +1521,10 @@ void vhost_dev_cleanup(struct vhost_dev *hdev)
{
int i;
+ if (!hdev->inited) {
+ return;
+ }
+ hdev->inited = false;
trace_vhost_dev_cleanup(hdev);
for (i = 0; i < hdev->nvqs; ++i) {
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index ca3131b1af..74b1aec960 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -123,6 +123,7 @@ struct vhost_dev {
/* @started: is the vhost device started? */
bool started;
bool log_enabled;
+ bool inited;
uint64_t log_size;
Error *migration_blocker;
const VhostOps *vhost_ops;
Thanks.
>
>> On Aug 4, 2023, at 1:29 AM, Li Feng <fengli@smartx.com> wrote:
>>
>> When the vhost-user is reconnecting to the backend, and if the vhost-user
>> fails
>> at the get_features in vhost_dev_init(), then the reconnect will fail
>> and it will not be retriggered forever.
>>
>> The reason is:
>> When the vhost-user fail at get_features, the vhost_dev_cleanup will be
>> called
>> immediately.
>>
>> vhost_dev_cleanup calls 'memset(hdev, 0, sizeof(struct vhost_dev))'.
>>
>> The reconnect path is:
>> vhost_user_blk_event
>> vhost_user_async_close(.. vhost_user_blk_disconnect ..)
>> qemu_chr_fe_set_handlers <----- clear the notifier callback
>> schedule vhost_user_async_close_bh
>>
>> The vhost->vdev is null, so the vhost_user_blk_disconnect will not be
>> called, then the event fd callback will not be reinstalled.
>>
>> With this patch, the vhost_user_blk_disconnect will call the
>> vhost_dev_cleanup() again, it's safe.
>>
>> All vhost-user devices have this issue, including vhost-user-blk/scsi.
>>
>> Fixes: 71e076a07d ("hw/virtio: generalise CHR_EVENT_CLOSED handling")
>>
>> Signed-off-by: Li Feng <fengli@smartx.com>
>> ---
>> hw/virtio/vhost-user.c | 10 +---------
>> 1 file changed, 1 insertion(+), 9 deletions(-)
>>
>> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
>> index 8dcf049d42..697b403fe2 100644
>> --- a/hw/virtio/vhost-user.c
>> +++ b/hw/virtio/vhost-user.c
>> @@ -2648,16 +2648,8 @@ typedef struct {
>> static void vhost_user_async_close_bh(void *opaque)
>> {
>> VhostAsyncCallback *data = opaque;
>> - struct vhost_dev *vhost = data->vhost;
>>
>> - /*
>> - * If the vhost_dev has been cleared in the meantime there is
>> - * nothing left to do as some other path has completed the
>> - * cleanup.
>> - */
>> - if (vhost->vdev) {
>> - data->cb(data->dev);
>> - }
>> + data->cb(data->dev);
>>
>> g_free(data);
>> }
>> --
>> 2.41.0
>>
>
[PATCH 1/2] vhost-user: fix lost reconnect, Li Feng, 2023/08/04
[PATCH v2 0/2] Fix vhost reconnect issues, Li Feng, 2023/08/24
[PATCH v2 2/2] vhost: Add Error parameter to vhost_scsi_common_start(), Li Feng, 2023/08/24