qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 3/4] qcow2: add zoned emulation capability


From: Sam Li
Subject: Re: [PATCH v2 3/4] qcow2: add zoned emulation capability
Date: Mon, 28 Aug 2023 19:55:52 +0800

Stefan Hajnoczi <stefanha@redhat.com> 于2023年8月23日周三 03:48写道:
>
> On Mon, Aug 14, 2023 at 04:58:01PM +0800, Sam Li wrote:
> > By adding zone operations and zoned metadata, the zoned emulation
> > capability enables full emulation support of zoned device using
> > a qcow2 file. The zoned device metadata includes zone type,
> > zoned device state and write pointer of each zone, which is stored
> > to an array of unsigned integers.
> >
> > Each zone of a zoned device makes state transitions following
> > the zone state machine. The zone state machine mainly describes
> > five states, IMPLICIT OPEN, EXPLICIT OPEN, FULL, EMPTY and CLOSED.
> > READ ONLY and OFFLINE states will generally be affected by device
> > internal events. The operations on zones cause corresponding state
> > changing.
> >
> > Zoned devices have a limit on zone resources, which puts constraints on
> > write operations into zones.
> >
> > Signed-off-by: Sam Li <faithilikerun@gmail.com>
> > ---
> >  block/qcow2.c          | 676 ++++++++++++++++++++++++++++++++++++++++-
> >  block/qcow2.h          |   2 +
> >  docs/interop/qcow2.txt |   2 +
> >  3 files changed, 678 insertions(+), 2 deletions(-)
> >
> > diff --git a/block/qcow2.c b/block/qcow2.c
> > index c1077c4a4a..5ccf79cbe7 100644
> > --- a/block/qcow2.c
> > +++ b/block/qcow2.c
> > @@ -194,6 +194,164 @@ qcow2_extract_crypto_opts(QemuOpts *opts, const char 
> > *fmt, Error **errp)
> >      return cryptoopts_qdict;
> >  }
> >
> > +#define QCOW2_ZT_IS_CONV(wp)    (wp & 1ULL << 59)
> > +
> > +static inline int qcow2_get_wp(uint64_t wp)
> > +{
> > +    /* clear state and type information */
> > +    return ((wp << 5) >> 5);
> > +}
> > +
> > +static inline int qcow2_get_zs(uint64_t wp)
> > +{
> > +    return (wp >> 60);
> > +}
> > +
> > +static inline void qcow2_set_wp(uint64_t *wp, BlockZoneState zs)
> > +{
> > +    uint64_t addr = qcow2_get_wp(*wp);
> > +    addr |= ((uint64_t)zs << 60);
> > +    *wp = addr;
> > +}
> > +
> > +/*
> > + * File wp tracking: reset zone, finish zone and append zone can
> > + * change the value of write pointer. All zone operations will change
> > + * the state of that/those zone.
> > + * */
> > +static inline void qcow2_wp_tracking_helper(int index, uint64_t wp) {
> > +    /* format: operations, the wp. */
> > +    printf("wps[%d]: 0x%x\n", index, qcow2_get_wp(wp)>>BDRV_SECTOR_BITS);
> > +}
> > +
> > +/*
> > + * Perform a state assignment and a flush operation that writes the new wp
> > + * value to the dedicated location of the disk file.
> > + */
> > +static int qcow2_write_wp_at(BlockDriverState *bs, uint64_t *wp,
> > +                             uint32_t index, BlockZoneState zs) {
> > +    BDRVQcow2State *s = bs->opaque;
> > +    int ret;
> > +
> > +    qcow2_set_wp(wp, zs);
> > +    ret = bdrv_pwrite(bs->file, s->zoned_header.zonedmeta_offset
> > +        + sizeof(uint64_t) * index, sizeof(uint64_t), wp, 0);
> > +
> > +    if (ret < 0) {
> > +        goto exit;
>
> Should *wp be restored to its original value to undo the effect of
> qcow2_set_wp()?
>
> > +    }
> > +    qcow2_wp_tracking_helper(index, *wp);
> > +    return ret;
> > +
> > +exit:
> > +    error_report("Failed to write metadata with file");
> > +    return ret;
> > +}
> > +
> > +static int qcow2_check_active(BlockDriverState *bs)
>
> Please rename this to qcow2_check_active_zones() to avoid confusion with
> other uses "active" in qcow2.
>
> > +{
> > +    BDRVQcow2State *s = bs->opaque;
> > +
> > +    if (!s->zoned_header.max_active_zones) {
> > +        return 0;
> > +    }
> > +
> > +    if (s->nr_zones_exp_open + s->nr_zones_imp_open + s->nr_zones_closed
> > +        < s->zoned_header.max_active_zones) {
> > +        return 0;
> > +    }
> > +
> > +    return -1;
> > +}
>
> (This function could return a bool instead of 0/-1 since it doesn't
> really need an int.)
>
> > +
> > +static int qcow2_check_open(BlockDriverState *bs)
>
> qcow2_check_open_zones() or, even better, qcow2_can_open_zone().
>
> > +{
> > +    BDRVQcow2State *s = bs->opaque;
> > +    int ret;
> > +
> > +    if (!s->zoned_header.max_open_zones) {
> > +        return 0;
> > +    }
> > +
> > +    if (s->nr_zones_exp_open + s->nr_zones_imp_open
> > +        < s->zoned_header.max_open_zones) {
> > +        return 0;
> > +    }
> > +
> > +    if(s->nr_zones_imp_open) {
> > +        ret = qcow2_check_active(bs);
> > +        if (ret == 0) {
> > +            /* TODO: it takes O(n) time complexity (n = nr_zones).
> > +             * Optimizations required. */
>
> One solution is to keep an implicitly open list. Then this operation is
> O(1).

Yes, I'll add this maybe in the v4.

>
> > +            /* close one implicitly open zones to make it available */
> > +            for (int i = s->zoned_header.zone_nr_conv;
> > +            i < bs->bl.nr_zones; ++i) {
> > +                uint64_t *wp = &s->wps->wp[i];
> > +                if (qcow2_get_zs(*wp) == BLK_ZS_IOPEN) {
> > +                    ret = qcow2_write_wp_at(bs, wp, i, BLK_ZS_CLOSED);
>
> I'm wondering if it's correct to store the zone state persistently in
> the qcow2 file. If the guest or QEMU crashes, then zones will be left in
> states like EOPEN. Since the guest software will have forgotten about
> explicitly opened zones, the guest would need to recover zone states.
> I'm not sure if existing software is designed to do that.
>
> Damien: Should the zone state be persistent?

Pin?

>
> > +                    if (ret < 0) {
> > +                        return ret;
> > +                    }
> > +                    s->wps->wp[i] = *wp;
> > +                    s->nr_zones_imp_open--;
> > +                    s->nr_zones_closed++;
> > +                    break;
> > +                }
> > +            }
> > +            return 0;
>
> This returns 0 if there are no IOPEN zones available. It needs to return
> an error when there are not enough resources available.
>
> > +        }
> > +        return ret;
> > +    }
> > +
> > +    return -1;
>
> This function mixes 0/-1 with -errno return values. -1 is -EPERM, but I
> think that's not what you want.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]