qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 2/2] 9pfs: fix init_in_iov_from_pdu truncating size


From: Stefano Stabellini
Subject: Re: [PATCH 2/2] 9pfs: fix init_in_iov_from_pdu truncating size
Date: Thu, 14 May 2020 09:24:58 -0700 (PDT)
User-agent: Alpine 2.21 (DEB 202 2017-01-01)

On Thu, 14 May 2020, Christian Schoenebeck wrote:
> On Donnerstag, 14. Mai 2020 01:31:30 CEST Stefano Stabellini wrote:
> > > If there is really no way with Xen to ensure there's always a buffer size
> > > according to msize,
> > 
> > Thank you for the long writeup. Let's see if we can come up with a
> > good plan.
> > 
> > The Xen transport
> > (https://xenbits.xenproject.org/docs/4.13-testing/misc/9pfs.html) has a
> > shared memory area with 64 pages, so 256KB. It is split in half: 128KB
> > for requests and 128KB for responses.
> 
> That's very little. That also means you won't be able to ever achieve decent 
> 9pfs performance with Xen, because that requires a much larger msize of 
> approximately msize >= 4MB, depending on the underlying I/O hardware, 
> tendency 
> growing (due to rising I/O hardware bandwidth).

That's great feedback. Fortunately the size of the transport is
configurable so it is one liner away from becoming much bigger (well,
two oneliners, see net/9p/trans_xen.c:XEN_9PFS_RING_ORDER in Linux, and
hw/9pfs/xen-9p-backend.c:MAX_RING_ORDER in QEMU.)

Would you recommend 4MB each way, so a total of 8MB?


> > The original error message is the following:
> > 
> >   xen be: 9pfs-0: xen be: 9pfs-0: Xen 9pfs request type 116 needs 126987
> > bytes, buffer has 126965 Xen 9pfs request type 116needs 126987 bytes,
> > buffer has 126965
> > 
> > and msize was set to 131072 in the Linux client.
> > 
> > So transport_size == msize. However, there can be multiple requests and
> > responses inflight at any given time. It means that even with
> > transport_size=128KB and msize=4KB we could still have so many outstanding
> > requests that the available memory is less then 4KB.
> > 
> > I admit that with transport_size=128KB and msize=4KB it would be so
> > unlikely to happen that it would be close to impossible to reproduce the
> > error. But theoretically it is possible.
> 
> Even though msize=4kiB is the default value with Linux clients, I would never 
> use such a small value as it would cause an extremely poor 9pfs performance. 
> My plan BTW is to log a performance warning soon if a client is using such a 
> little msize.

Sounds good to me too


> > If we can't come up with any better ideas, we could simply limit the
> > msize in the Linux client to something like transport_size/2 or
> > transport_size/4, but it doesn't feel right.
> 
> That sounds like it would just reduce likeliness for the low buffer situation 
> to happen with Xen, it would not prevent it though.

Right


> As far as I understand you now, the problem with Xen transport seems to be 
> that clients can submit requests more quickly than the 9pfs server could 
> process. That would change the overall situation completely, because my hack 
> solution with delaying delivery (yield) suggested previously, was based on 
> the 
> assumption that this low transport buffer scenario only happens once on boot, 
> but not possibly at any later point again and again.

Yes I think it could happen at any time, not just at boot.

>From my understanding, I don't think it is necessarily due to QEMU being
slow. In fact, if QEMU was slow and the client fast, the reply ring
would be empty because the client would be up to speed, therefore msize
== transport_available_size, and there is no problem.

I think it happens when the client submits a bunch of requests but then
it is slow in reading back the replies. Hence, replies accumulate on the
ring, and transport_available_size < msize.

Given that currently the total ring size for replies is equal to msize,
we just need 2 inflight requests to potentially cause the situation.
For instance, a read of 10 bytes and a read of 128KB (max msize): if
issued together they lead to QEMU writing 10 bytes on the reply ring,
and then not having enough room to write 128KB anymore. This example is
a bit of an oversimplification because it doesn't account of the size of
the headers, but I hope you got my point.

So maybe it is OK to delaying delivery because it seems to me that is
the issue.

Also I am thinking that if we are going to increase the ring size to
4MB, maybe we should take the opportunity to set msize=transport_size/2
just to reduce the likelihood of this kind of issue, in addition to a
proper fix.

So maybe transport=8MB and msize=4MB or transport=4MB and msize=2MB.


> > > As you can see it all boils down to one thing: better look for a solution
> > > for Xen to deliver the required buffer size, or in worst case: yield the
> > > coroutine, i.e. delay delivery of response until that can be assured by
> > > Xen.
> > That's might be the best option. But it would have to be
> > xen_9pfs_init_in_iov_from_pdu to call qemu_coroutine_yield() in a loop?
> > Like:
> > 
> > 
> > diff --git a/hw/9pfs/xen-9p-backend.c b/hw/9pfs/xen-9p-backend.c
> > index 18fe5b7c92..ef1db47d0c 100644
> > --- a/hw/9pfs/xen-9p-backend.c
> > +++ b/hw/9pfs/xen-9p-backend.c
> > @@ -197,18 +197,13 @@ static void xen_9pfs_init_in_iov_from_pdu(V9fsPDU
> > *pdu, g_free(ring->sg);
> > 
> >      ring->sg = g_new0(struct iovec, 2);
> > -    xen_9pfs_in_sg(ring, ring->sg, &num, pdu->idx, *size);
> > 
> > +again:
> > +    xen_9pfs_in_sg(ring, ring->sg, &num, pdu->idx, *size);
> >      buf_size = iov_size(ring->sg, num);
> > -    if (buf_size  < P9_IOHDRSZ) {
> > -        xen_pv_printf(&xen_9pfs->xendev, 0, "Xen 9pfs request type %d"
> > -                "needs %zu bytes, buffer has %zu, less than minimum\n",
> > -                pdu->id, *size, buf_size);
> > -        xen_be_set_state(&xen_9pfs->xendev, XenbusStateClosing);
> > -        xen_9pfs_disconnect(&xen_9pfs->xendev);
> > -    }
> >      if (buf_size  < *size) {
> > -        *size = buf_size;
> > +        qemu_coroutine_yield();
> > +        goto again;
> >      }
> > 
> >      *piov = ring->sg;
> > 
> > 
> > How is the coroutine going to get scheduled again? Is it automatic or do
> > we need to call it again explicitly when we get the next notification
> > from the client, which should arrive as soon as the client finishes
> > reading the responses?
> 
> I "think" the coroutine would be resumed in the next main loop iteration. But 
> I am still reading up on qemu's coroutine implementation details myself. So 
> that might work, yes, even though in a hackish way.
> 
> And as explained above, if this situation not only happens on boot, then you 
> might consider a different approach. There is also 
> yield_until_fd_readable(int 
> fd); BTW. However the obvious step would be increasing Xen transport capacity 
> tremendously at first place.

Yes, I think we want to do three things:
- increase the transport size for performance
- set msize=transport_size/2 to decrease the likehood of this kind of
  issues
- introduce a slow but safe fix when the issue happens, and
  qemu_coroutine_yield() or yield_until_fd_readable() could be OK if it
  is very rare



reply via email to

[Prev in Thread] Current Thread [Next in Thread]