Re: [PATCH 1/4] qdev: add DEVICE_RUNTIME

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/4] qdev: add DEVICE_RUNTIME_ERROR event

From:	Roman Kagan
Subject:	Re: [PATCH 1/4] qdev: add DEVICE_RUNTIME_ERROR event
Date:	Mon, 30 May 2022 18:04:32 +0300

On Mon, May 30, 2022 at 01:28:17PM +0200, Markus Armbruster wrote:
> Roman Kagan <rvkagan@yandex-team.ru> writes:
> 
> > On Wed, May 25, 2022 at 12:54:47PM +0200, Markus Armbruster wrote:
> >> Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes:
> >> 
> >> > This event represents device runtime errors to give time and
> >> > reason why device is broken.
> >> 
> >> Can you give an or more examples of the "device runtime errors" you have
> >> in mind?
> >
> > Initially we wanted to address a situation when a vhost device
> > discovered an inconsistency during virtqueue processing and silently
> > stopped the virtqueue.  This resulted in device stall (partial for
> > multiqueue devices) and we were the last to notice that.
> >
> > The solution appeared to be to employ errfd and, upon receiving a
> > notification through it, to emit a QMP event which is actionable in the
> > management layer or further up the stack.
> >
> > Then we observed that virtio (non-vhost) devices suffer from the same
> > issue: they only log the error but don't signal it to the management
> > layer.  The case was very similar so we thought it would make sense to
> > share the infrastructure and the QMP event between virtio and vhost.
> >
> > Then Konstantin went a bit further and generalized the concept into
> > generic "device runtime error".  I'm personally not completely convinced
> > this generalization is appropriate here; we'd appreciate the opinions
> > from the community on the matter.
> 
> "Device emulation sending an even on entering certain error states, so
> that a management application can do something about it" feels
> reasonable enough to me as a general concept.
> 
> The key point is of course "can do something": the event needs to be
> actionable.  Can you describe possible actions for the cases you
> implement?

The first one that we had in mind was informational, like triggering an
alert in the monitoring system and/or painting the VM as malfunctioning
in the owner's UI.

There can be more advanced scenarios like autorecovery by resetting the
faulty VM, or fencing it if it's a cluster member.

Thanks,
Roman.

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH 1/4] qdev: add DEVICE_RUNTIME_ERROR event, Konstantin Khlebnikov, 2022/05/19
- [PATCH 3/4] vhost: add method vhost_set_vring_err, Konstantin Khlebnikov, 2022/05/19
- [PATCH 4/4] vhost: forward vring errors into virtio device, Konstantin Khlebnikov, 2022/05/19
- [PATCH 2/4] virtio: forward errors into qdev_report_runtime_error(), Konstantin Khlebnikov, 2022/05/19
  - Re: [PATCH 2/4] virtio: forward errors into qdev_report_runtime_error(), Vladimir Sementsov-Ogievskiy, 2022/05/24
- Re: [PATCH 1/4] qdev: add DEVICE_RUNTIME_ERROR event, Vladimir Sementsov-Ogievskiy, 2022/05/24
  - Re: [PATCH 1/4] qdev: add DEVICE_RUNTIME_ERROR event, Konstantin Khlebnikov, 2022/05/25
- Re: [PATCH 1/4] qdev: add DEVICE_RUNTIME_ERROR event, Markus Armbruster, 2022/05/25
  - Re: [PATCH 1/4] qdev: add DEVICE_RUNTIME_ERROR event, Roman Kagan, 2022/05/27
    - Re: [PATCH 1/4] qdev: add DEVICE_RUNTIME_ERROR event, Markus Armbruster, 2022/05/30
    - Re: [PATCH 1/4] qdev: add DEVICE_RUNTIME_ERROR event, Roman Kagan <=

Prev by Date: Re: [PATCH] loader: support loading large files (>=2GB)
Next by Date: [PATCH v5 00/10] qmp, hmp: statistics subsystem and KVM suport.
Previous by thread: Re: [PATCH 1/4] qdev: add DEVICE_RUNTIME_ERROR event
Next by thread: Re: [PATCH v4] fcntl: Add 32bit filesystem mode
Index(es):
- Date
- Thread