[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [RFC PATCH] block/io.c: Flush parent for quorum in generic code
From: |
Zhang, Chen |
Subject: |
RE: [RFC PATCH] block/io.c: Flush parent for quorum in generic code |
Date: |
Mon, 17 May 2021 17:59:38 +0000 |
> -----Original Message-----
> From: Stefan Hajnoczi <stefanha@redhat.com>
> Sent: Thursday, May 13, 2021 10:26 PM
> To: Zhang, Chen <chen.zhang@intel.com>
> Cc: Kevin Wolf <kwolf@redhat.com>; Max Reitz <mreitz@redhat.com>; Fam
> Zheng <fam@euphon.net>; qemu-dev <qemu-devel@nongnu.org>; qemu-
> block <qemu-block@nongnu.org>; Zhang Chen <zhangckid@gmail.com>;
> Minghao Yuan <meeho@qq.com>
> Subject: Re: [RFC PATCH] block/io.c: Flush parent for quorum in generic code
>
> On Wed, May 12, 2021 at 03:49:57PM +0800, Zhang Chen wrote:
> > Fix the issue from this patch:
> > [PATCH] block: Flush all children in generic code From
> > 883833e29cb800b4d92b5d4736252f4004885191
> >
> > Quorum driver do not have the primary child.
> > It will caused guest block flush issue when use quorum and NBD.
> > The vm guest flushes failed,and then guest filesystem is shutdown.
> >
> > Signed-off-by: Zhang Chen <chen.zhang@intel.com>
> > Reported-by: Minghao Yuan <meeho@qq.com>
> > ---
> > block/io.c | 31 ++++++++++++++++++++++---------
> > 1 file changed, 22 insertions(+), 9 deletions(-)
> ...
> > +flush_data:
> > + if (no_primary_child) {
> > + /* Flush parent */
> > + ret = bs->file ? bdrv_co_flush(bs->file->bs) : 0;
> > + } else {
> > + /* Flush childrens */
> > + ret = 0;
> > + QLIST_FOREACH(child, &bs->children, next) {
> > + if (child->perm & (BLK_PERM_WRITE |
> BLK_PERM_WRITE_UNCHANGED)) {
> > + int this_child_ret = bdrv_co_flush(child->bs);
> > + if (!ret) {
> > + ret = this_child_ret;
> > + }
> > }
> > }
>
> I'm missing something:
>
> The quorum driver has a valid bs->children list even though it does not have a
> primary child. Why does QLIST_FOREACH() loop fail for you?
>
Yes, in most cases QLIST_FOREACH() works for me.
But not work when one of the child disconnected, the original patch changed
the default behavior of quorum driver when occurs issue.
For example:
VM quorum driver have two children, local disk1 and NBD disk2.
When network down and NBD disk2 disconnected, current code will report
"event": "QUORUM_REPORT_BAD" "type": "flush", "error": "Input/output error"
And
"event": "BLOCK_IO_ERROR" "#block008", "reason": "Input/output error"
The guest even cannot read/write the normal local disk1. VM IO will crashed
caused by NDB disk2 input/output error.
I think we do need report the event about we lose a child(NBD disk2) at this
time, but no need crash all IO system.
Because we can fix it by x-blockdev-change and drive_add/drive_del for new
children.
Before the original patch(883833e2), VM still can read/write the local disk1.
This patch just the RFC version, please give me more comments to fix this issue.
Thanks
Chen
> Does this patch effectively skip bdrv_co_flush() calls on all quorum children?
> That seems wrong since children need to be flushed so that data is persisted.
>
Yes,
> Stefan