[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 1/4] qdev: add DEVICE_RUNTIME_ERROR event
From: |
Markus Armbruster |
Subject: |
Re: [PATCH 1/4] qdev: add DEVICE_RUNTIME_ERROR event |
Date: |
Mon, 30 May 2022 13:28:17 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) |
Roman Kagan <rvkagan@yandex-team.ru> writes:
> On Wed, May 25, 2022 at 12:54:47PM +0200, Markus Armbruster wrote:
>> Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes:
>>
>> > This event represents device runtime errors to give time and
>> > reason why device is broken.
>>
>> Can you give an or more examples of the "device runtime errors" you have
>> in mind?
>
> Initially we wanted to address a situation when a vhost device
> discovered an inconsistency during virtqueue processing and silently
> stopped the virtqueue. This resulted in device stall (partial for
> multiqueue devices) and we were the last to notice that.
>
> The solution appeared to be to employ errfd and, upon receiving a
> notification through it, to emit a QMP event which is actionable in the
> management layer or further up the stack.
>
> Then we observed that virtio (non-vhost) devices suffer from the same
> issue: they only log the error but don't signal it to the management
> layer. The case was very similar so we thought it would make sense to
> share the infrastructure and the QMP event between virtio and vhost.
>
> Then Konstantin went a bit further and generalized the concept into
> generic "device runtime error". I'm personally not completely convinced
> this generalization is appropriate here; we'd appreciate the opinions
> from the community on the matter.
"Device emulation sending an even on entering certain error states, so
that a management application can do something about it" feels
reasonable enough to me as a general concept.
The key point is of course "can do something": the event needs to be
actionable. Can you describe possible actions for the cases you
implement?
Once we all have a better idea of the event's purpose, usage, and
limitations, we should revisit its documentation.
- [PATCH 1/4] qdev: add DEVICE_RUNTIME_ERROR event, Konstantin Khlebnikov, 2022/05/19
- [PATCH 3/4] vhost: add method vhost_set_vring_err, Konstantin Khlebnikov, 2022/05/19
- [PATCH 4/4] vhost: forward vring errors into virtio device, Konstantin Khlebnikov, 2022/05/19
- [PATCH 2/4] virtio: forward errors into qdev_report_runtime_error(), Konstantin Khlebnikov, 2022/05/19
- Re: [PATCH 1/4] qdev: add DEVICE_RUNTIME_ERROR event, Vladimir Sementsov-Ogievskiy, 2022/05/24
- Re: [PATCH 1/4] qdev: add DEVICE_RUNTIME_ERROR event, Markus Armbruster, 2022/05/25