qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v11 2/7] QIOChannelSocket: Implement io_writev zero copy flag


From: Leonardo Bras Soares Passos
Subject: Re: [PATCH v11 2/7] QIOChannelSocket: Implement io_writev zero copy flag & io_flush for CONFIG_LINUX
Date: Thu, 5 May 2022 12:42:47 -0300

On Thu, May 5, 2022 at 5:05 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Wed, May 04, 2022 at 04:18:31PM -0300, Leonardo Bras wrote:
> > For CONFIG_LINUX, implement the new zero copy flag and the optional callback
> > io_flush on QIOChannelSocket, but enables it only when MSG_ZEROCOPY
> > feature is available in the host kernel, which is checked on
> > qio_channel_socket_connect_sync()
> >
> > qio_channel_socket_flush() was implemented by counting how many times
> > sendmsg(...,MSG_ZEROCOPY) was successfully called, and then reading the
> > socket's error queue, in order to find how many of them finished sending.
> > Flush will loop until those counters are the same, or until some error 
> > occurs.
> >
> > Notes on using writev() with QIO_CHANNEL_WRITE_FLAG_ZERO_COPY:
> > 1: Buffer
> > - As MSG_ZEROCOPY tells the kernel to use the same user buffer to avoid 
> > copying,
> > some caution is necessary to avoid overwriting any buffer before it's sent.
> > If something like this happen, a newer version of the buffer may be sent 
> > instead.
> > - If this is a problem, it's recommended to call qio_channel_flush() before 
> > freeing
> > or re-using the buffer.
> >
> > 2: Locked memory
> > - When using MSG_ZERCOCOPY, the buffer memory will be locked after queued, 
> > and
> > unlocked after it's sent.
> > - Depending on the size of each buffer, and how often it's sent, it may 
> > require
> > a larger amount of locked memory than usually available to non-root user.
> > - If the required amount of locked memory is not available, writev_zero_copy
> > will return an error, which can abort an operation like migration,
> > - Because of this, when an user code wants to add zero copy as a feature, it
> > requires a mechanism to disable it, so it can still be accessible to less
> > privileged users.
> >
> > Signed-off-by: Leonardo Bras <leobras@redhat.com>
> > Reviewed-by: Peter Xu <peterx@redhat.com>
> > Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
> > Reviewed-by: Juan Quintela <quintela@redhat.com>
> > ---
> >  include/io/channel-socket.h |   2 +
> >  io/channel-socket.c         | 120 ++++++++++++++++++++++++++++++++++--
> >  2 files changed, 118 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/io/channel-socket.h b/include/io/channel-socket.h
> > index e747e63514..513c428fe4 100644
> > --- a/include/io/channel-socket.h
> > +++ b/include/io/channel-socket.h
> > @@ -47,6 +47,8 @@ struct QIOChannelSocket {
> >      socklen_t localAddrLen;
> >      struct sockaddr_storage remoteAddr;
> >      socklen_t remoteAddrLen;
> > +    ssize_t zero_copy_queued;
> > +    ssize_t zero_copy_sent;
> >  };
> >
> >
> > diff --git a/io/channel-socket.c b/io/channel-socket.c
> > index 696a04dc9c..ae756ce166 100644
> > --- a/io/channel-socket.c
> > +++ b/io/channel-socket.c
> > @@ -25,9 +25,25 @@
> >  #include "io/channel-watch.h"
> >  #include "trace.h"
> >  #include "qapi/clone-visitor.h"
> > +#ifdef CONFIG_LINUX
> > +#include <linux/errqueue.h>
> > +#include <sys/socket.h>
> > +#endif
> >
> >  #define SOCKET_MAX_FDS 16
> >
> > +/*
> > + * Zero-copy defines bellow are included to avoid breaking builds on 
> > systems
> > + * that don't support MSG_ZEROCOPY, while keeping the functions more 
> > readable
> > + * (without a lot of ifdefs).
> > + */
> > +#ifndef MSG_ZEROCOPY
> > +#define MSG_ZEROCOPY 0x4000000
> > +#endif
> > +#ifndef SO_ZEROCOPY
> > +#define SO_ZEROCOPY 60
> > +#endif
>
> Please put these behind CONFIG_LINUX to make it clear to readers that
> this is entirely Linux specific
>
>
> > +
> >  SocketAddress *
> >  qio_channel_socket_get_local_address(QIOChannelSocket *ioc,
> >                                       Error **errp)
> > @@ -54,6 +70,8 @@ qio_channel_socket_new(void)
> >
> >      sioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET));
> >      sioc->fd = -1;
> > +    sioc->zero_copy_queued = 0;
> > +    sioc->zero_copy_sent = 0;
> >
> >      ioc = QIO_CHANNEL(sioc);
> >      qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN);
> > @@ -153,6 +171,16 @@ int qio_channel_socket_connect_sync(QIOChannelSocket 
> > *ioc,
> >          return -1;
> >      }
> >
> > +#ifdef CONFIG_LINUX
> > +    int ret, v = 1;
> > +    ret = setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, &v, sizeof(v));
> > +    if (ret == 0) {
> > +        /* Zero copy available on host */
> > +        qio_channel_set_feature(QIO_CHANNEL(ioc),
> > +                                QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY);
> > +    }
> > +#endif
> > +
> >      return 0;
> >  }
> >
> > @@ -533,6 +561,7 @@ static ssize_t qio_channel_socket_writev(QIOChannel 
> > *ioc,
> >      char control[CMSG_SPACE(sizeof(int) * SOCKET_MAX_FDS)];
> >      size_t fdsize = sizeof(int) * nfds;
> >      struct cmsghdr *cmsg;
> > +    int sflags = 0;
> >
> >      memset(control, 0, CMSG_SPACE(sizeof(int) * SOCKET_MAX_FDS));
> >
> > @@ -557,15 +586,27 @@ static ssize_t qio_channel_socket_writev(QIOChannel 
> > *ioc,
> >          memcpy(CMSG_DATA(cmsg), fds, fdsize);
> >      }
> >
> > +    if (flags & QIO_CHANNEL_WRITE_FLAG_ZERO_COPY) {
> > +        sflags = MSG_ZEROCOPY;
> > +    }
>
> Also should be behind CONFIG_LINUX


Hello Daniel,

But what if this gets compiled in a Linux system without MSG_ZEROCOPY support?
As qapi will have zero-copy-send as an option we could have this scenario:

- User request migration using zero-copy-send
- multifd_save_setup() will set write_flags = QIO_CHANNEL_WRITE_FLAG_ZERO_COPY
- In qio_channel_socket_connect_sync(): setsockopt() part will be
compiled-out, so no error here
- above part in qio_channel_socket_writev() will be commented-out,
which means write_flags will be ignored
- sflags will not contain MSG_ZEROCOPY, so sendmsg() will use copy-mode
- migration will succeed

In the above case, the user has all the reason to think migration is
using MSG_ZEROCOPY, but in fact it's quietly falling back to
copy-mode.

That's why I suggested creating a 'global' config usiing SO_ZEROCOPY &
MSG_ZEROCOPY & CONFIG_LINUX so we can use in qapi and have no chance
of even offering zero-copy-send if we don't have it.

Another local option is to do implement your suggestions, and also
change qio_channel_socket_connect_sync() so it returns an error if
MSG_ZEROCOPY && SO_ZEROCOPY is not present, such as:

+#ifdef CONFIG_LINUX
+#if defined(MSG_ZEROCOPY)  && defined(SO_ZEROCOPY)
+    int ret, v = 1;
+    ret = setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, &v, sizeof(v));
+    if (ret == 0) {
+        /* Zero copy available on host */
+        qio_channel_set_feature(QIO_CHANNEL(ioc),
+                                QIO_CHANNEL_FEATURE_WRITE_ZERO_COPY);
+    }
+#else
+    error_setg_errno(errp, errno,"MSG_ZEROCOPY not available");
+    return -1;
+#endif
+#endif

What do you think?

Best regards,
Leo

>
> > +
> >   retry:
> > -    ret = sendmsg(sioc->fd, &msg, 0);
> > +    ret = sendmsg(sioc->fd, &msg, sflags);
> >      if (ret <= 0) {
> > -        if (errno == EAGAIN) {
> > +        switch (errno) {
> > +        case EAGAIN:
> >              return QIO_CHANNEL_ERR_BLOCK;
> > -        }
> > -        if (errno == EINTR) {
> > +        case EINTR:
> >              goto retry;
> > +        case ENOBUFS:
> > +            if (sflags & MSG_ZEROCOPY) {
> > +                error_setg_errno(errp, errno,
> > +                                 "Process can't lock enough memory for 
> > using MSG_ZEROCOPY");
> > +                return -1;
> > +            }
> > +            break;
>
> And this ENOBUGS case behind CONFIG_LINUX
>


[...]




reply via email to

[Prev in Thread] Current Thread [Next in Thread]