[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-block] [PATCH] block/rbd: add preallocation support
From: |
Stefano Garzarella |
Subject: |
Re: [Qemu-block] [PATCH] block/rbd: add preallocation support |
Date: |
Mon, 29 Apr 2019 14:47:12 +0200 |
User-agent: |
NeoMutt/20180716 |
On Sat, Apr 27, 2019 at 08:43:26AM -0400, Jason Dillaman wrote:
> On Sat, Apr 27, 2019 at 7:37 AM Stefano Garzarella <address@hidden> wrote:
> >
> > This patch adds the support of preallocation (off/full) for the RBD
> > block driver.
> > If available, we use rbd_writesame() to quickly fill the image when
> > full preallocation is required.
> >
> > Signed-off-by: Stefano Garzarella <address@hidden>
> > ---
> > block/rbd.c | 149 ++++++++++++++++++++++++++++++++++++++-----
> > qapi/block-core.json | 4 +-
> > 2 files changed, 136 insertions(+), 17 deletions(-)
> >
> > diff --git a/block/rbd.c b/block/rbd.c
> > index 0c549c9935..29dd1bb040 100644
> > --- a/block/rbd.c
> > +++ b/block/rbd.c
> > @@ -13,6 +13,7 @@
> >
> > #include "qemu/osdep.h"
> >
> > +#include "qemu/units.h"
> > #include <rbd/librbd.h>
> > #include "qapi/error.h"
> > #include "qemu/error-report.h"
> > @@ -331,6 +332,110 @@ static void qemu_rbd_memset(RADOSCB *rcb, int64_t
> > offs)
> > }
> > }
> >
> > +static int qemu_rbd_do_truncate(rbd_image_t image, int64_t offset,
> > + PreallocMode prealloc, Error **errp)
> > +{
> > + uint64_t current_length;
> > + char *buf = NULL;
> > + int ret;
> > +
> > + ret = rbd_get_size(image, ¤t_length);
> > + if (ret < 0) {
> > + error_setg_errno(errp, -ret, "Failed to get file length");
> > + goto out;
> > + }
> > +
> > + if (current_length > offset && prealloc != PREALLOC_MODE_OFF) {
> > + error_setg(errp, "Cannot use preallocation for shrinking files");
> > + ret = -ENOTSUP;
> > + goto out;
> > + }
> > +
> > + switch (prealloc) {
> > + case PREALLOC_MODE_FULL: {
> > + uint64_t current_offset = current_length;
> > + int buf_size = 64 * KiB;
>
> This should probably be 512B or 4KiB (which is the smallest striped
> unit size). Not only will this avoid sending unnecessary zeroes to the
> backing cluster, writesame silently turns into a standard write if
> your buffer isn't properly aligned with the min(object size, stripe
> unit size).
>
Okay, I'll change it on v2.
Should we query about the "stripe_unit" size or we simply use the
smallest allowed?
> > + ssize_t bytes;
> > +
> > + buf = g_malloc(buf_size);
> > + /*
> > + * Some versions of rbd_writesame() discards writes of buffers with
> > + * all zeroes. In order to avoid this behaviour, we set the first
> > byte
> > + * to one.
> > + */
> > + buf[0] = 1;
>
> You could also use "rados_conf_set(cluster,
> "rbd_discard_on_zeroed_write_same", "false").
>
I tried it, but it is not supported on all versions. (eg. I have Ceph
v12.2.11 on my Fedora 29 and it is not supported, but rbd_writesame() is
available)
Maybe we can use both: "rbd_discard_on_zeroed_write_same = false" and
"buf[0] = 1"
> > + ret = rbd_resize(image, offset);
> > + if (ret < 0) {
> > + error_setg_errno(errp, -ret, "Failed to resize file");
> > + goto out;
> > + }
> > +
> > +#ifdef LIBRBD_SUPPORTS_WRITESAME
> > + while (offset - current_offset > buf_size) {
> > + /*
> > + * rbd_writesame() supports only request where the size of the
> > + * operation is multiple of buffer size and it must be less or
> > + * equal to INT_MAX.
> > + */
> > + bytes = MIN(offset - current_offset, INT_MAX);
> > + bytes -= bytes % buf_size;
>
> Using the default object size of 4MiB, this write size would result in
> up to 512 concurrent ops to the backing cluster. Perhaps the size
> should be bounded such that only a dozen or so concurrent requests are
> issued per write, always rounded next largest object / stripe period
> size. librbd and the rbd CLI usually try to bound themselves to the
> value in the "rbd_concurrent_management_ops" configuration setting
> (currently defaults to 10).
>
Do you suggest to use "rbd_concurrent_management_ops" to limit
concurrent requests or use a new QEMU parameters for the RBD driver?
Thanks for your comments,
Stefano