qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [PATCH v8] introduce vfio-user protocol specification


From: Thanos Makatos
Subject: RE: [PATCH v8] introduce vfio-user protocol specification
Date: Fri, 7 May 2021 16:10:08 +0000

> -----Original Message-----
> From: John Levon <john.levon@nutanix.com>
> Sent: 05 May 2021 17:20
> To: Stefan Hajnoczi <stefanha@redhat.com>
> Cc: Thanos Makatos <thanos.makatos@nutanix.com>; qemu-
> devel@nongnu.org; John Levon <levon@movementarian.org>; John G
> Johnson <john.g.johnson@oracle.com>; benjamin.walker@intel.com; Elena
> Ufimtseva <elena.ufimtseva@oracle.com>; jag.raman@oracle.com;
> james.r.harris@intel.com; Swapnil Ingle <swapnil.ingle@nutanix.com>;
> konrad.wilk@oracle.com; alex.williamson@redhat.com;
> yuvalkashtan@gmail.com; tina.zhang@intel.com;
> marcandre.lureau@redhat.com; ismael@linux.com;
> Kanth.Ghatraju@oracle.com; Felipe Franciosi <felipe@nutanix.com>;
> xiuchun.lu@intel.com; tomassetti.andrea@gmail.com; Raphael Norwitz
> <raphael.norwitz@nutanix.com>; changpeng.liu@intel.com;
> dgilbert@redhat.com; Yan Zhao <yan.y.zhao@intel.com>; Michael S . Tsirkin
> <mst@redhat.com>; Gerd Hoffmann <kraxel@redhat.com>; Christophe de
> Dinechin <cdupontd@redhat.com>; Jason Wang <jasowang@redhat.com>;
> Cornelia Huck <cohuck@redhat.com>; Kirti Wankhede
> <kwankhede@nvidia.com>; Paolo Bonzini <pbonzini@redhat.com>;
> mpiszczek@ddn.com
> Subject: Re: [PATCH v8] introduce vfio-user protocol specification
> 
> On Tue, May 04, 2021 at 02:51:45PM +0100, Stefan Hajnoczi wrote:
> 
> > > +While passing of file descriptors is desirable for performance
> > > +reasons, it is not necessary neither for the client nor for the
> > > +server to support it in order
> >
> > Double negative. "not" can be removed.
> 
> Done. I also took a `:set spell` pass on the whole spec, which should catch
> your other typos.
> 
> > > +Device Read and Write
> > > +^^^^^^^^^^^^^^^^^^^^^
> > > +When the guest executes load or store operations to device memory,
> > > +the client
> >
> > <linux/vfio.h> calls it "device regions", not "device memory".
> > s/device memory/unmapped device regions/?
> 
> Spec was sloppy here, yes. Fixed up all the instances I noticed.
> 
> > Does anything need to be said about mmaps and file descriptors on
> > disconnected? I guess they need to be cleaned up and are not retained
> > for future reconnection?
> 
> Updated:
> 
> ```
> Therefore in order for the protocol to be forward compatible, the server
> should respond to a client disconnection as follows:
> 
>  - all client memory regions are unmapped and cleaned up (including closing
> any
>    passed file descriptors)
>  - all IRQ file descriptors passed from the old client are closed
>  - the device state should otherwise be retained
> 
> The expectation is that when a client reconnects, it will re-establish IRQ and
> client memory mappings.
> 
> If anything happens to the client (such as qemu really did exit), the control
> stack will know about it and can clean up resources accordingly.
> ```
> 
> > Are there rules for avoiding deadlock between client->server and
> > server->client messages? For example, the client sends
> > VFIO_USER_REGION_WRITE and the server sends
> VFIO_USER_VM_INTERRUPT
> > before replying to the write message.
> >
> > Multi-threaded clients and servers could end up deadlocking if
> > messages are processed while polling threads handle other device activity
> (e.g.
> > I/O requests that cause DMA messages).
> >
> > Pipelining has the nice effect that the oldest message must complete
> > before the next pipelined message starts. It imposes a maximum issue
> > depth of 1. Still, it seems like it would be relatively easy to hit
> > re-entrancy or deadlock issues since both the client and the server
> > can initiate messages and may need to wait for a response.
> 
> Filed https://github.com/nutanix/libvfio-user/issues/466
> 
> > > +* *Offset* is the file offset of the region with respect to the
> > > +associated file
> > > +  descriptor.
> > > +* *Protections* are the region's protection attributes as encoded
> > > +in
> > > +  ``<sys/mman.h>``.
> >
> > Please be more specific. Does it only include PROT_READ and
> PROT_WRITE?
> > What about PROT_EXEC?
> 
> Updated to just have PROT_READ/WRITE.
> 
> > > +If a DMA region being added can be directly mapped by the server,
> > > +an array of file descriptors must be sent as part of the message
> > > +meta-data. Each mappable region entry must have a corresponding
> > > +file descriptor. On AF_UNIX sockets, the file descriptors must be
> > > +passed as SCM_RIGHTS type ancillary data. Otherwise, if a DMA
> > > +region cannot be directly mapped by the server, it can be accessed
> > > +by the server using VFIO_USER_DMA_READ and
> VFIO_USER_DMA_WRITE
> > > +messages, explained in `Read and Write Operations`_. A command to
> map over an existing region must be failed by the server with ``EEXIST`` set 
> in
> error field in the reply.
> >
> > Does this mean a vIOMMU update, like a protections bits change
> > requires an unmap command followed by a map command? That is not an
> > atomic operation but hopefully guests don't try to update a vIOMMU
> > mapping while accessing it.
> 
> Filed https://github.com/nutanix/libvfio-user/issues/467
> 
> > > +This command message is sent by the client to the server to inform
> > > +it that a DMA region, previously made available via a
> > > +VFIO_USER_DMA_MAP command message, is no longer available for
> DMA.
> > > +It typically occurs when memory is subtracted from the client or if
> > > +the client uses a vIOMMU. If the client does not expect the server
> > > +to perform DMA then it does not need to send to the server
> > > +VFIO_USER_DMA_UNMAP commands. If the server does not need to
> > > +perform DMA then it can ignore such commands but it must still
> > > +reply to them. The table is an
> >
> > I'm a little confused by the last two sentences about not sending or
> > ignoring VFIO_USER_DMA_UNMAP. Does it mean that
> VFIO_USER_DMA_MAP does
> > not need to be sent either when the device is known never to need DMA?
> 
> If a device is never going to access client memory (either directly mapped or
> DMA_READ/WRITE), there's no need to inform the server of VM memory.  I
> removed the sentences as they were just confusing, sort of obvious, and not
> really relevant to real systems.
> 
> > > +* *Address* is the base DMA address of the region.
> > > +* *Size* is the size of the region.
> > > +* *Offset* is the file offset of the region with respect to the
> > > +associated file
> > > +  descriptor.
> > > +* *Protections* are the region's protection attributes as encoded
> > > +in
> > > +  ``<sys/mman.h>``.
> >
> > Why are offset and protections required for the unmap command?
> 
> They are now explicitly listed as ignored.
> 
> > > +* *Flags* contains the following region attributes:
> > > +
> > > +  * *VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP* indicates that a
> dirty page bitmap
> > > +    must be populated before unmapping the DMA region. The client
> must provide
> > > +    a ``struct vfio_bitmap`` in the VFIO bitmaps field for each region, 
> > > with
> > > +    the ``vfio_bitmap.pgsize`` and ``vfio_bitmap.size`` fields 
> > > initialized.
> > > +
> > > +* *VFIO Bitmaps* contains one ``struct vfio_bitmap`` per region
> > > +(explained
> > > +  below) if ``VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP`` is set in
> Flags.
> >
> > I'm confused, it's 1 "VFIO Bitmaps" per "Table entry". Why does it
> > contain one struct vfio_bitmap per region? What is a "region" in this
> > context?
> 
> Thanos?

The "region" here is a DMA region. A VFIO_USER_DMA_UNMAP command can unmap
multiple DMA regions and each DMA region is described by a table entry. What I
wanted to say here is that if the VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP bit is
set, then for each DMA region being removed there must be a "struct vfio_bitmap"
entry in the "VFIO Bitmaps" bitmaps section. I'll rephrase the text to make it
clearer.

> 
> > > +Upon receiving a VFIO_USER_DMA_UNMAP command, if the file
> > > +descriptor is mapped then the server must release all references to
> > > +that DMA region before replying, which includes potentially in
> > > +flight DMA transactions. Removing a portion of a DMA region is possible.
> >
> > "Removing a portion of a DMA region is possible"
> > -> doing so splits a larger DMA region into one or two smaller remaining
> regions?
> 
> We certainly don't have that implemented. Thanos?

Correct, removing a portion of a DMA region could result in two regions.
Indeed, this isn't currently implemented. We wanted to leave the door open to
such behavior if we wanted it in the future. Now I'm thinking that maybe we
should put this behind an optional capability.

> 
> 
> On Wed, May 05, 2021 at 04:51:12PM +0100, Stefan Hajnoczi wrote:
> 
> > > > How do potentially large messages work around max_msg_size? It is
> > > > hard for the client/server to anticipate the maximum message size
> > > > that will be required ahead of time, so they can't really know if
> > > > they will hit a situation where max_msg_size is too low.
> > >
> > > Are there specific messages you're worried about? would it help to
> > > add a specification stipulation as to minimum size that clients and
> > > servers must support?
> > >
> > > Ultimately the max msg size exists solely to ease implementation:
> > > with a reasonable fixed size, we can always consume the entire data
> > > in one go, rather than doing partial reads. Obviously that needs a
> > > limit to avoid unbounded allocations.
> >
> > It came to mind when reading about the dirty bitmap messages. Memory
> > dirty bitmaps can become large. An 8 GB memory region has a 1 MB dirty
> > bitmap.
> 
> Right, yeah. I filed https://github.com/nutanix/libvfio-user/issues/469 to
> track it.
> 
> regards
> john



reply via email to

[Prev in Thread] Current Thread [Next in Thread]