[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Patch] Robustly search for ZFS labels & uberblocks
From: |
Vladimir 'φ-coder/phcoder' Serbinenko |
Subject: |
Re: [Patch] Robustly search for ZFS labels & uberblocks |
Date: |
Thu, 03 Nov 2011 15:45:18 +0100 |
User-agent: |
Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.23) Gecko/20111010 Iceowl/1.0b2 Icedove/3.1.15 |
On 19.09.2011 20:45, Zachary Bedell wrote:
> Greetings,
>
> In working with ZFSonLinux, I found myself with a number of somewhat
> inconstant pools which failed either when running grub-probe or at boot time.
> In all cases, these pools were importable by the ZFS driver and completed a
> `zpool scrub` with no errors reported, but still Grub wasn't able to touch
> the pools. In most (but not all) cases, running `zpool scrub` made the pool
> accessible to Grub again, though obviously that was less than helpful for
> boot-time failures.
>
Patch is pretty difficult to read due to spacing changes. Please use -bB
or indent
> I'm including an (unfortunately rather large) patch to Grub's ZFS code which
> fixes several edge cases where Grub is unable to read a pool which is
> otherwise valid as far as the full ZFS driver is concerned. Changes include:
>
> ashift:
>
> * Support non-default values of ashift pool attribute. When ashift is
> increased beyond 10, the size of the uberblocks changes. Scan to find
> uberblocks must account for this and skip over the extra padding. Based on
> patch to Grub 0.97 by Hans Rosenfeld at http://www.illumos.org/issues/1303
>
>
> Label parsing changes:
>
> * Access the two end-device labels at label-size aligned location rather than
> device_end-(2*label_size). On-disk spec document incorrectly describes
> end-label locations. Adapted from Grub 0.97 behavior.
>
> * Scan all readable labels for uberblocks and accept the one with the highest
> txg/create date. Previously UB scan would stop if a valid UB was found in
> Label 0 without checking for newer UB's in the other labels. All labels must
> be scanned as a more recent block may be found in the other labels in certain
> circumstances. This fixes a case where Grub would be unable to access a ZFS
> system unless the pool was scrubbed first (but ZFS itself saw no problems &
> scrub reported zero errors).
>
> * Verify that uberblock txg is greater than that of the label before
> accepting the UB so that partially written UB's are not considered.
>
> * Verify that underlying data pointed by uberblock ub_rootbp has valid
> checksums before accepting the UB. This gives some possibility of booting
> from a technically invalid pool and effectively falls back to older
> uberblocks in a case where the correct uberblock is damaged.
>
> * Store pool uuid found during zfs_mount to grub_zfs_data in order to prevent
> duplicated logic in zfs_uuid.
>
>
> Data integrity:
>
> * Verify checksum in grub_read_data before accepting block. Attempt to read
> backup DVA's if checksum on first fails. Previously backup DVA's were
> checked only if the physical read failed, not if bad data was read without
> error. This resulted in pools which were valid and readable by ZFS (and
> transparently healed by ZFS' read behavior) being rejected by Grub.
>
>
> Logging/Debugging:
>
> * Provide more debugging output describing inconsistencies found in pool.
>
> * Remove superflous debugging output to reduce bootup time in verbose mode to
> a mangeable level (~30min down to ~5min to boot w/ 'debug=all' in grub.cfg)
>
>
>
> I do apologies for the monolithic patch. I took a second look at the changes
> to try to pull certain elements apart, but I ended up in several situations
> where the test pools I have snapshotted in VM's each hit two or more of the
> issues above making testing in isolation difficult. The total change set is
> able to probe and boot from all of the odd (as well as all of the normal)
> pools that I have on hand.
>
> If remixing this patch in some way would help in integrating it, I'd welcome
> any feedback on how to better present the changes.
>
> Best regards,
> Zac Bedell
>
>
> grub-zfs-ashift-label.patch
>
>
> diff --git a/grub-core/fs/zfs/zfs.c b/grub-core/fs/zfs/zfs.c
> index 1eea13b..d03e19b 100644
> --- a/grub-core/fs/zfs/zfs.c
> +++ b/grub-core/fs/zfs/zfs.c
> @@ -75,9 +75,21 @@ GRUB_MOD_LICENSE ("GPLv3+");
> static grub_dl_t my_mod;
> #endif
>
> +/*
> + * Macros to get fields in a bp or DVA.
> + */
> #define P2PHASE(x, align) ((x) & ((align) - 1))
> #define DVA_OFFSET_TO_PHYS_SECTOR(offset) \
> ((offset + VDEV_LABEL_START_SIZE) >> SPA_MINBLOCKSHIFT)
> +
> +/*
> + * return x rounded down to an align boundary
> + * eg, P2ALIGN(1200, 1024) == 1024 (1*align)
> + * eg, P2ALIGN(1024, 1024) == 1024 (1*align)
> + * eg, P2ALIGN(0x1234, 0x100) == 0x1200 (0x12*align)
> + * eg, P2ALIGN(0x5600, 0x100) == 0x5600 (0x56*align)
> + */
> +#define P2ALIGN(x, align) ((x) & -(align))
>
We already have such macros, they're called ALIGN_UP and ALIGN_DOWN.
> /*
> * FAT ZAP data structures
> @@ -147,6 +159,14 @@ struct grub_zfs_data
> grub_uint64_t file_start;
> grub_uint64_t file_end;
>
> + /* XXX: ashift is per vdev, not per pool. We currently only ever touch
> + * a single vdev, but when/if raid-z or stripes are supported, this
> + * may need revision.
> + */
> + grub_uint64_t vdev_ashift;
> + grub_uint64_t label_txg;
> + grub_uint64_t pool_guid;
> +
We have multidevice support and we have a special structure for it.
Moreover it seems that you need it only on initial scan and so no need
to permanently store it.
> /* cache for a dnode block */
> dnode_phys_t *dnode_buf;
> dnode_phys_t *dnode_mdn;
> @@ -192,7 +212,11 @@ static decomp_entry_t
> decomp_table[ZIO_COMPRESS_FUNCTIONS] = {
>
> static grub_err_t zio_read_data (blkptr_t * bp, grub_zfs_endian_t endian,
> void *buf, struct grub_zfs_data *data);
> -
> +
> +static grub_err_t
> +zio_read (blkptr_t * bp, grub_zfs_endian_t endian, void **buf,
> + grub_size_t *size, struct grub_zfs_data *data);
> +
> /*
> * Our own version of log2(). Same thing as highbit()-1.
> */
> @@ -271,6 +295,7 @@ zio_checksum_verify (zio_cksum_t zc, grub_uint32_t
> checksum,
> || (actual_cksum.zc_word[2] != zc.zc_word[2])
> || (actual_cksum.zc_word[3] != zc.zc_word[3]))
> {
> + /*
> grub_dprintf ("zfs", "checksum %d verification failed\n", checksum);
> grub_dprintf ("zfs", "actual checksum %16llx %16llx %16llx %16llx\n",
> (unsigned long long) actual_cksum.zc_word[0],
> @@ -282,6 +307,7 @@ zio_checksum_verify (zio_cksum_t zc, grub_uint32_t
> checksum,
> (unsigned long long) zc.zc_word[1],
> (unsigned long long) zc.zc_word[2],
> (unsigned long long) zc.zc_word[3]);
> + */
Please don't comment out the code. This particular one is very useful
actually.
> return grub_error (GRUB_ERR_BAD_FS, "checksum verification failed");
> }
>
> @@ -332,17 +358,22 @@ vdev_uberblock_compare (uberblock_t * ub1, uberblock_t
> * ub2)
> * Three pieces of information are needed to verify an uberblock: the magic
> * number, the version number, and the checksum.
> *
> - * Currently Implemented: version number, magic number
> + * Currently Implemented: version number, magic number, label txg
> * Need to Implement: checksum
> *
> */
> static grub_err_t
> -uberblock_verify (uberblock_phys_t * ub, int offset)
> +uberblock_verify (uberblock_t * uber, int offset, struct grub_zfs_data *data)
> {
> - uberblock_t *uber = &ub->ubp_uberblock;
> grub_err_t err;
> grub_zfs_endian_t endian = UNKNOWN_ENDIAN;
> zio_cksum_t zc;
> + void *osp = NULL;
> + grub_size_t ospsize;
> +
> + if (uber->ub_txg < data->label_txg)
> + return grub_error (GRUB_ERR_BAD_FS,
> + "ignoring partially written label: uber_txg < label_txg");
>
> if (grub_zfs_to_cpu64 (uber->ub_magic, LITTLE_ENDIAN) == UBERBLOCK_MAGIC
> && grub_zfs_to_cpu64 (uber->ub_version, LITTLE_ENDIAN) > 0
> @@ -358,10 +389,21 @@ uberblock_verify (uberblock_phys_t * ub, int offset)
> return grub_error (GRUB_ERR_BAD_FS, "invalid uberblock magic");
>
> grub_memset (&zc, 0, sizeof (zc));
> -
> zc.zc_word[0] = grub_cpu_to_zfs64 (offset, endian);
> err = zio_checksum_verify (zc, ZIO_CHECKSUM_LABEL, endian,
> - (char *) ub, UBERBLOCK_SIZE);
> + (char *) uber, UBERBLOCK_SIZE(data->vdev_ashift));
> +
> + if (!err)
> + {
> + // Check that the data pointed by the rootbp is usable.
> + void *osp = NULL;
> + grub_size_t ospsize;
> + err = zio_read (&uber->ub_rootbp, endian, &osp, &ospsize, data);
> + grub_free (osp);
> +
> + if (!err && ospsize < OBJSET_PHYS_SIZE_V14)
> + return grub_error (GRUB_ERR_BAD_FS, "uberblock rootbp points to
> invalid data");
> + }
>
This should be if (err) return err; ....
> return err;
> }
> @@ -372,32 +414,40 @@ uberblock_verify (uberblock_phys_t * ub, int offset)
> * Success - Pointer to the best uberblock.
> * Failure - NULL
> */
> -static uberblock_phys_t *
> -find_bestub (uberblock_phys_t * ub_array, grub_disk_addr_t sector)
> +static uberblock_t *
> +find_bestub (char *ub_array, struct grub_zfs_data *data)
> {
> - uberblock_phys_t *ubbest = NULL;
> - int i;
> - grub_disk_addr_t offset;
> + const grub_disk_addr_t sector = data->vdev_phys_sector;
> + uberblock_t *ubbest = NULL;
> + uberblock_t *ubnext;
> + unsigned int i, offset, pickedub = 0;
> grub_err_t err = GRUB_ERR_NONE;
>
> - for (i = 0; i < (VDEV_UBERBLOCK_RING >> VDEV_UBERBLOCK_SHIFT); i++)
> + const unsigned int UBCOUNT = UBERBLOCK_COUNT (data->vdev_ashift);
> + const grub_uint64_t UBBYTES = UBERBLOCK_SIZE (data->vdev_ashift);
> +
> + for (i = 0; i < UBCOUNT; i++)
> {
> - offset = (sector << SPA_MINBLOCKSHIFT) + VDEV_PHYS_SIZE
> - + (i << VDEV_UBERBLOCK_SHIFT);
> + ubnext = (uberblock_t *) (i * UBBYTES + ub_array);
> + offset = (sector << SPA_MINBLOCKSHIFT) + VDEV_PHYS_SIZE + (i *
> UBBYTES);
>
> - err = uberblock_verify (&ub_array[i], offset);
> + err = uberblock_verify (ubnext, offset, data);
> if (err)
> - {
> - grub_errno = GRUB_ERR_NONE;
> - continue;
> - }
> - if (ubbest == NULL
> - || vdev_uberblock_compare (&(ub_array[i].ubp_uberblock),
> - &(ubbest->ubp_uberblock)) > 0)
> - ubbest = &ub_array[i];
> + {
> + grub_errno = GRUB_ERR_NONE;
> + continue;
> + }
> + if (ubbest == NULL || vdev_uberblock_compare (ubnext, ubbest) > 0)
> + {
> + ubbest = ubnext;
> + pickedub = i;
> + }
> }
> if (!ubbest)
> grub_errno = err;
> + else
> + grub_dprintf ("zfs", "Found best uberblock at idx %d, txg %llu\n",
> + pickedub, (unsigned long long) ubbest->ub_txg);
>
Again the same error. Don't put such if's.
> return (ubbest);
> }
> @@ -412,9 +462,6 @@ get_psize (blkptr_t * bp, grub_zfs_endian_t endian)
> static grub_uint64_t
> dva_get_offset (dva_t * dva, grub_zfs_endian_t endian)
> {
> - grub_dprintf ("zfs", "dva=%llx, %llx\n",
> - (unsigned long long) dva->dva_word[0],
> - (unsigned long long) dva->dva_word[1]);
> return grub_zfs_to_cpu64 ((dva)->dva_word[1],
> endian) << SPA_MINBLOCKSHIFT;
> }
Please don't mix dprintf removals with actual changes. Just makes more
difficult to read.
> @@ -440,11 +487,9 @@ zio_read_gang (blkptr_t * bp, grub_zfs_endian_t endian,
> dva_t * dva, void *buf,
> zio_gb = grub_malloc (SPA_GANGBLOCKSIZE);
> if (!zio_gb)
> return grub_errno;
> - grub_dprintf ("zfs", endian == LITTLE_ENDIAN ? "little-endian gang\n"
> - :"big-endian gang\n");
> +
> offset = dva_get_offset (dva, endian);
> sector = DVA_OFFSET_TO_PHYS_SECTOR (offset);
> - grub_dprintf ("zfs", "offset=%llx\n", (unsigned long long) offset);
>
> /* read in the gang block header */
> err = grub_disk_read (data->disk, sector, 0, SPA_GANGBLOCKSIZE,
> @@ -515,10 +560,20 @@ zio_read_data (blkptr_t * bp, grub_zfs_endian_t endian,
> void *buf,
> sector = DVA_OFFSET_TO_PHYS_SECTOR (offset);
> err = grub_disk_read (data->disk, sector, 0, psize, buf);
> }
> - if (!err)
> - return GRUB_ERR_NONE;
> - grub_errno = GRUB_ERR_NONE;
> - }
> +
> + if (!err)
> + {
> + // Check the underlying checksum before we rule this DVA as
> "good"
> + grub_uint32_t checkalgo = (grub_zfs_to_cpu64 ((bp)->blk_prop,
> endian) >> 40) & 0xff;
> +
> + err = zio_checksum_verify (bp->blk_cksum, checkalgo, endian,
> buf, psize);
> + if (!err)
> + return GRUB_ERR_NONE;
> + }
> +
> + // If read failed or checksum bad, reset the error. Hopefully we've
> got some more DVA's to try.
You need to correctly handle the case when you don't. See e.g. the
mirror reading code. Also we don't use C++ style comments. I fail to see
the structure and need of this checksum move.
> + grub_errno = GRUB_ERR_NONE;
> + }
>
> if (!err)
> err = grub_error (GRUB_ERR_BAD_FS, "couldn't find a valid DVA");
> @@ -539,12 +594,9 @@ zio_read (blkptr_t * bp, grub_zfs_endian_t endian, void
> **buf,
> unsigned int comp;
> char *compbuf = NULL;
> grub_err_t err;
> - zio_cksum_t zc = bp->blk_cksum;
> - grub_uint32_t checksum;
>
> *buf = NULL;
>
> - checksum = (grub_zfs_to_cpu64((bp)->blk_prop, endian) >> 40) & 0xff;
> comp = (grub_zfs_to_cpu64((bp)->blk_prop, endian)>>32) & 0xff;
> lsize = (BP_IS_HOLE(bp) ? 0 :
> (((grub_zfs_to_cpu64 ((bp)->blk_prop, endian) & 0xffff) + 1)
> @@ -571,7 +623,6 @@ zio_read (blkptr_t * bp, grub_zfs_endian_t endian, void
> **buf,
> else
> compbuf = *buf = grub_malloc (lsize);
>
> - grub_dprintf ("zfs", "endian = %d\n", endian);
> err = zio_read_data (bp, endian, compbuf, data);
> if (err)
> {
> @@ -580,15 +631,6 @@ zio_read (blkptr_t * bp, grub_zfs_endian_t endian, void
> **buf,
> return err;
> }
>
> - err = zio_checksum_verify (zc, checksum, endian, compbuf, psize);
> - if (err)
> - {
> - grub_dprintf ("zfs", "incorrect checksum\n");
> - grub_free (compbuf);
> - *buf = NULL;
> - return err;
> - }
> -
> if (comp != ZIO_COMPRESS_OFF)
> {
> *buf = grub_malloc (lsize);
> @@ -635,7 +677,6 @@ dmu_read (dnode_end_t * dn, grub_uint64_t blkid, void
> **buf,
> endian = dn->endian;
> for (level = dn->dn.dn_nlevels - 1; level >= 0; level--)
> {
> - grub_dprintf ("zfs", "endian = %d\n", endian);
> idx = (blkid >> (epbs * level)) & ((1 << epbs) - 1);
> *bp = bp_array[idx];
> if (bp_array != dn->dn.dn_blkptr)
> @@ -661,12 +702,10 @@ dmu_read (dnode_end_t * dn, grub_uint64_t blkid, void
> **buf,
> }
> if (level == 0)
> {
> - grub_dprintf ("zfs", "endian = %d\n", endian);
> err = zio_read (bp, endian, buf, 0, data);
> endian = (grub_zfs_to_cpu64 (bp->blk_prop, endian) >> 63) & 1;
> break;
> }
> - grub_dprintf ("zfs", "endian = %d\n", endian);
> err = zio_read (bp, endian, &tmpbuf, 0, data);
> endian = (grub_zfs_to_cpu64 (bp->blk_prop, endian) >> 63) & 1;
> if (err)
> @@ -717,9 +756,6 @@ mzap_iterate (mzap_phys_t * zapobj, grub_zfs_endian_t
> endian, int objsize,
> chunks = objsize / MZAP_ENT_LEN - 1;
> for (i = 0; i < chunks; i++)
> {
> - grub_dprintf ("zfs", "zap: name = %s, value = %llx, cd = %x\n",
> - mzap_ent[i].mze_name, (long long)mzap_ent[i].mze_value,
> - (int)mzap_ent[i].mze_cd);
> if (hook (mzap_ent[i].mze_name,
> grub_zfs_to_cpu64 (mzap_ent[i].mze_value, endian)))
> return 1;
> @@ -849,8 +885,6 @@ zap_leaf_lookup (zap_leaf_phys_t * l, grub_zfs_endian_t
> endian,
> if (grub_zfs_to_cpu64 (le->le_hash,endian) != h)
> continue;
>
> - grub_dprintf ("zfs", "fzap: length %d\n", (int) le->le_name_length);
> -
> if (zap_leaf_array_equal (l, endian, blksft,
> grub_zfs_to_cpu16 (le->le_name_chunk,endian),
> grub_zfs_to_cpu16 (le->le_name_length, endian),
> @@ -1050,22 +1084,16 @@ zap_lookup (dnode_end_t * zap_dnode, char *name,
> grub_uint64_t * val,
> return err;
> block_type = grub_zfs_to_cpu64 (*((grub_uint64_t *) zapbuf), endian);
>
> - grub_dprintf ("zfs", "zap read\n");
> -
> if (block_type == ZBT_MICRO)
> {
> - grub_dprintf ("zfs", "micro zap\n");
> err = (mzap_lookup (zapbuf, endian, size, name, val));
> - grub_dprintf ("zfs", "returned %d\n", err);
> grub_free (zapbuf);
> return err;
> }
> else if (block_type == ZBT_HEADER)
> {
> - grub_dprintf ("zfs", "fat zap\n");
> /* this is a fat zap */
> err = (fzap_lookup (zap_dnode, zapbuf, name, val, data));
> - grub_dprintf ("zfs", "returned %d\n", err);
> grub_free (zapbuf);
> return err;
> }
> @@ -1092,18 +1120,14 @@ zap_iterate (dnode_end_t * zap_dnode,
> return 0;
> block_type = grub_zfs_to_cpu64 (*((grub_uint64_t *) zapbuf), endian);
>
> - grub_dprintf ("zfs", "zap read\n");
> -
> if (block_type == ZBT_MICRO)
> {
> - grub_dprintf ("zfs", "micro zap\n");
> ret = mzap_iterate (zapbuf, endian, size, hook);
> grub_free (zapbuf);
> return ret;
> }
> else if (block_type == ZBT_HEADER)
> {
> - grub_dprintf ("zfs", "fat zap\n");
> /* this is a fat zap */
> ret = fzap_iterate (zap_dnode, zapbuf, hook, data);
> grub_free (zapbuf);
> @@ -1150,12 +1174,9 @@ dnode_get (dnode_end_t * mdn, grub_uint64_t objnum,
> grub_uint8_t type,
> return GRUB_ERR_NONE;
> }
>
> - grub_dprintf ("zfs", "endian = %d, blkid=%llx\n", mdn->endian,
> - (unsigned long long) blkid);
> err = dmu_read (mdn, blkid, &dnbuf, &endian, data);
> if (err)
> return err;
> - grub_dprintf ("zfs", "alive\n");
>
> grub_free (data->dnode_buf);
> grub_free (data->dnode_mdn);
> @@ -1426,27 +1447,19 @@ get_filesystem_dnode (dnode_end_t * mosmdn, char
> *fsname,
> grub_uint64_t objnum;
> grub_err_t err;
>
> - grub_dprintf ("zfs", "endian = %d\n", mosmdn->endian);
> -
> err = dnode_get (mosmdn, DMU_POOL_DIRECTORY_OBJECT,
> DMU_OT_OBJECT_DIRECTORY, mdn, data);
> if (err)
> return err;
>
> - grub_dprintf ("zfs", "alive\n");
> -
> err = zap_lookup (mdn, DMU_POOL_ROOT_DATASET, &objnum, data);
> if (err)
> return err;
>
> - grub_dprintf ("zfs", "alive\n");
> -
> err = dnode_get (mosmdn, objnum, DMU_OT_DSL_DIR, mdn, data);
> if (err)
> return err;
>
> - grub_dprintf ("zfs", "alive\n");
> -
> while (*fsname)
> {
> grub_uint64_t childobj;
> @@ -1491,8 +1504,6 @@ make_mdn (dnode_end_t * mdn, struct grub_zfs_data *data)
> grub_size_t ospsize;
> grub_err_t err;
>
> - grub_dprintf ("zfs", "endian = %d\n", mdn->endian);
> -
> bp = &(((dsl_dataset_phys_t *) DN_BONUS (&mdn->dn))->ds_bp);
> err = zio_read (bp, mdn->endian, &osp, &ospsize, data);
> if (err)
> @@ -1558,7 +1569,6 @@ dnode_get_fullpath (const char *fullpath, dnode_end_t *
> mdn,
> grub_dprintf ("zfs", "fsname = '%s' snapname='%s' filename = '%s'\n",
> fsname, snapname, filename);
> }
> - grub_dprintf ("zfs", "alive\n");
> err = get_filesystem_dnode (&(data->mos), fsname, dn, data);
> if (err)
> {
> @@ -1567,12 +1577,8 @@ dnode_get_fullpath (const char *fullpath, dnode_end_t
> * mdn,
> return err;
> }
>
> - grub_dprintf ("zfs", "alive\n");
> -
> headobj = grub_zfs_to_cpu64 (((dsl_dir_phys_t *) DN_BONUS
> (&dn->dn))->dd_head_dataset_obj, dn->endian);
>
> - grub_dprintf ("zfs", "endian = %d\n", mdn->endian);
> -
> err = dnode_get (&(data->mos), headobj, DMU_OT_DSL_DATASET, mdn, data);
> if (err)
> {
> @@ -1580,7 +1586,6 @@ dnode_get_fullpath (const char *fullpath, dnode_end_t *
> mdn,
> grub_free (snapname);
> return err;
> }
> - grub_dprintf ("zfs", "endian = %d\n", mdn->endian);
>
> if (snapname)
> {
> @@ -1606,8 +1611,6 @@ dnode_get_fullpath (const char *fullpath, dnode_end_t *
> mdn,
> *mdnobj = headobj;
>
> make_mdn (mdn, data);
> -
> - grub_dprintf ("zfs", "endian = %d\n", mdn->endian);
>
> if (*isfs)
> {
> @@ -1864,11 +1867,9 @@ zfs_fetch_nvlist (struct grub_zfs_data * data, char
> **nvlist)
> static grub_err_t
> check_pool_label (struct grub_zfs_data *data)
> {
> - grub_uint64_t pool_state, txg = 0;
> - char *nvlist;
> -#if 0
> - char *nv;
> -#endif
> + grub_uint64_t pool_state;
> + char *nvlist; // for the pool
> + char *vdevnvlist; // for the vdev
> grub_uint64_t diskguid;
> grub_uint64_t version;
> int found;
> @@ -1878,8 +1879,6 @@ check_pool_label (struct grub_zfs_data *data)
> if (err)
> return err;
>
> - grub_dprintf ("zfs", "check 2 passed\n");
> -
> found = grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_POOL_STATE,
> &pool_state);
> if (! found)
> @@ -1889,16 +1888,16 @@ check_pool_label (struct grub_zfs_data *data)
> grub_error (GRUB_ERR_BAD_FS, ZPOOL_CONFIG_POOL_STATE " not found");
> return grub_errno;
> }
> - grub_dprintf ("zfs", "check 3 passed\n");
>
> if (pool_state == POOL_STATE_DESTROYED)
> {
> grub_free (nvlist);
> return grub_error (GRUB_ERR_BAD_FS, "zpool is marked as destroyed");
> }
> - grub_dprintf ("zfs", "check 4 passed\n");
>
> - found = grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_POOL_TXG,
> &txg);
> + data->label_txg = 0;
> + found = grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_POOL_TXG,
> + &data->label_txg);
> if (!found)
> {
> grub_free (nvlist);
> @@ -1906,15 +1905,13 @@ check_pool_label (struct grub_zfs_data *data)
> grub_error (GRUB_ERR_BAD_FS, ZPOOL_CONFIG_POOL_TXG " not found");
> return grub_errno;
> }
> - grub_dprintf ("zfs", "check 6 passed\n");
>
> /* not an active device */
> - if (txg == 0)
> + if (data->label_txg == 0)
> {
> grub_free (nvlist);
> return grub_error (GRUB_ERR_BAD_FS, "zpool isn't active");
> }
> - grub_dprintf ("zfs", "check 7 passed\n");
>
> found = grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_VERSION,
> &version);
> @@ -1925,42 +1922,79 @@ check_pool_label (struct grub_zfs_data *data)
> grub_error (GRUB_ERR_BAD_FS, ZPOOL_CONFIG_VERSION " not found");
> return grub_errno;
> }
> - grub_dprintf ("zfs", "check 8 passed\n");
>
> if (version > SPA_VERSION)
> {
> grub_free (nvlist);
> return grub_error (GRUB_ERR_NOT_IMPLEMENTED_YET,
> - "too new version %llu > %llu",
> + "SPA version too new %llu > %llu",
> (unsigned long long) version,
> (unsigned long long) SPA_VERSION);
> }
> - grub_dprintf ("zfs", "check 9 passed\n");
> -#if 0
> - if (nvlist_lookup_value (nvlist, ZPOOL_CONFIG_VDEV_TREE, &nv,
> - DATA_TYPE_NVLIST, NULL))
> +
> + vdevnvlist = grub_zfs_nvlist_lookup_nvlist (nvlist,
> ZPOOL_CONFIG_VDEV_TREE);
> + if (!vdevnvlist)
> {
> - grub_free (vdev);
> - return (GRUB_ERR_BAD_FS);
> + grub_free (nvlist);
> + if (!grub_errno)
> + grub_error (GRUB_ERR_BAD_FS, ZPOOL_CONFIG_VDEV_TREE " not found");
> + return grub_errno;
> + }
> +
> + found = grub_zfs_nvlist_lookup_uint64 (vdevnvlist, ZPOOL_CONFIG_ASHIFT,
> + &data->vdev_ashift);
> + grub_free (vdevnvlist);
> + if (!found)
> + {
> + grub_free (nvlist);
> + if (!grub_errno)
> + grub_error (GRUB_ERR_BAD_FS, ZPOOL_CONFIG_ASHIFT " not found");
> + return grub_errno;
> }
> - grub_dprintf ("zfs", "check 10 passed\n");
> -#endif
>
> found = grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_GUID,
> &diskguid);
> if (! found)
> {
> grub_free (nvlist);
> if (! grub_errno)
> - grub_error (GRUB_ERR_BAD_FS, ZPOOL_CONFIG_GUID " not found");
> + grub_error (GRUB_ERR_BAD_FS, ZPOOL_CONFIG_GUID " not found");
> + return grub_errno;
> + }
> +
> + found = grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_POOL_GUID,
> &data->pool_guid);
> + if (! found)
> + {
> + grub_free (nvlist);
> + if (! grub_errno)
> + grub_error (GRUB_ERR_BAD_FS, ZPOOL_CONFIG_POOL_GUID " not found");
> return grub_errno;
> }
> - grub_dprintf ("zfs", "check 11 passed\n");
>
> grub_free (nvlist);
>
> + grub_dprintf ("zfs", "ZFS Pool GUID: %llu (%016llx) Label: GUID: %llu
> (%016llx), txg: %llu, SPA v%llu, ashift: %llu\n",
> + (unsigned long long) data->pool_guid,
> + (unsigned long long) data->pool_guid,
> + (unsigned long long) diskguid,
> + (unsigned long long) diskguid,
> + (unsigned long long) data->label_txg,
> + (unsigned long long) version,
> + (unsigned long long) data->vdev_ashift);
> +
> return GRUB_ERR_NONE;
> }
>
> +/*
> + * vdev_label_start returns the physical disk offset (in bytes) of
> + * label "l".
> + */
> +static grub_uint64_t vdev_label_start (grub_uint64_t psize, int l)
> +{
> + return (l * sizeof (vdev_label_t) + (l < VDEV_LABELS / 2 ?
> + 0 : psize -
> + VDEV_LABELS * sizeof (vdev_label_t)));
> +}
> +
> static void
> zfs_unmount (struct grub_zfs_data *data)
> {
> @@ -1979,13 +2013,13 @@ static struct grub_zfs_data *
> zfs_mount (grub_device_t dev)
> {
> struct grub_zfs_data *data = 0;
> - int label = 0;
> - uberblock_phys_t *ub_array, *ubbest = NULL;
> - vdev_boot_header_t *bh;
> + int label = 0, bestlabel = -1;
> + char *ub_array;
> + uberblock_t *ubbest;
> + uberblock_t *ubcur = NULL;
> void *osp = 0;
> grub_size_t ospsize;
> grub_err_t err;
> - int vdevnum;
>
> if (! dev->disk)
> {
> @@ -1997,11 +2031,6 @@ zfs_mount (grub_device_t dev)
> if (!data)
> return 0;
> grub_memset (data, 0, sizeof (*data));
> -#if 0
> - /* if it's our first time here, zero the best uberblock out */
> - if (data->best_drive == 0 && data->best_part == 0 && find_best_root)
> - grub_memset (¤t_uberblock, 0, sizeof (uberblock_t));
> -#endif
>
> data->disk = dev->disk;
>
> @@ -2012,100 +2041,129 @@ zfs_mount (grub_device_t dev)
> return 0;
> }
>
> - bh = grub_malloc (VDEV_BOOT_HEADER_SIZE);
> - if (!bh)
> + ubbest = grub_malloc (sizeof (*ubbest));
> + if (!ubbest)
> {
> zfs_unmount (data);
> - grub_free (ub_array);
> return 0;
> }
> + grub_memset (ubbest, 0, sizeof (*ubbest));
>
> - vdevnum = VDEV_LABELS;
> + // Establish some constants for where things are on the device:
>
> - /* Don't check back labels on CDROM. */
> - if (grub_disk_get_size (dev->disk) == GRUB_DISK_SIZE_UNKNOWN)
> - vdevnum = VDEV_LABELS / 2;
> + /*
> + * some eltorito stacks don't give us a size and
> + * we end up setting the size to MAXUINT, further
> + * some of these devices stop working once a single
> + * read past the end has been issued. Checking
> + * for a maximum part_length and skipping the backup
> + * labels at the end of the slice/partition/device
> + * avoids breaking down on such devices.
> + */
> + const int vdevnum =
> + grub_disk_get_size (dev->disk) == GRUB_DISK_SIZE_UNKNOWN ?
> + VDEV_LABELS / 2 : VDEV_LABELS;
> +
> + // Size in bytes of the device (disk or partition) aligned to label size
> + grub_uint64_t device_size =
> + grub_disk_get_size (dev->disk) << GRUB_DISK_SECTOR_BITS;
> + const grub_uint64_t alignedbytes =
> + P2ALIGN (device_size, (grub_uint64_t) sizeof (vdev_label_t));
This is better to be moved to vdev_label_start.
>
> - for (label = 0; ubbest == NULL && label < vdevnum; label++)
> + for (label = 0; label < vdevnum; label++)
> {
> - grub_zfs_endian_t ub_endian = UNKNOWN_ENDIAN;
> - grub_dprintf ("zfs", "label %d\n", label);
> + grub_uint64_t labelstartbytes = vdev_label_start (alignedbytes, label);
> + grub_uint64_t labelstart = labelstartbytes >> GRUB_DISK_SECTOR_BITS;
>
> - data->vdev_phys_sector
> - = label * (sizeof (vdev_label_t) >> SPA_MINBLOCKSHIFT)
> - + ((VDEV_SKIP_SIZE + VDEV_BOOT_HEADER_SIZE) >> SPA_MINBLOCKSHIFT)
> - + (label < VDEV_LABELS / 2 ? 0 : grub_disk_get_size (dev->disk)
> - - VDEV_LABELS * (sizeof (vdev_label_t) >> SPA_MINBLOCKSHIFT));
> + grub_dprintf ("zfs", "reading label %d at sector %llu (byte %llu)\n",
> + label, (unsigned long long) labelstart,
> + (unsigned long long) labelstartbytes);
>
> - /* Read in the uberblock ring (128K). */
> - err = grub_disk_read (data->disk, data->vdev_phys_sector
> - + (VDEV_PHYS_SIZE >> SPA_MINBLOCKSHIFT),
> - 0, VDEV_UBERBLOCK_RING, (char *) ub_array);
> + data->vdev_phys_sector = labelstart +
> + ((VDEV_SKIP_SIZE + VDEV_BOOT_HEADER_SIZE) >>
> GRUB_DISK_SECTOR_BITS);
> +
> + err = check_pool_label (data);
> if (err)
> - {
> - grub_errno = GRUB_ERR_NONE;
> - continue;
> - }
> - grub_dprintf ("zfs", "label ok %d\n", label);
> + {
> + grub_dprintf ("zfs", "error checking label %d\n", label);
> + grub_errno = GRUB_ERR_NONE;
> + continue;
> + }
>
> - ubbest = find_bestub (ub_array, data->vdev_phys_sector);
> - if (!ubbest)
> - {
> - grub_dprintf ("zfs", "No uberblock found\n");
> - grub_errno = GRUB_ERR_NONE;
> - continue;
> - }
> - ub_endian = (grub_zfs_to_cpu64 (ubbest->ubp_uberblock.ub_magic,
> - LITTLE_ENDIAN) == UBERBLOCK_MAGIC
> - ? LITTLE_ENDIAN : BIG_ENDIAN);
> - err = zio_read (&ubbest->ubp_uberblock.ub_rootbp,
> - ub_endian,
> - &osp, &ospsize, data);
> + /* Read in the uberblock ring (128K). */
> + err = grub_disk_read (data->disk,
> + data->vdev_phys_sector +
> + (VDEV_PHYS_SIZE >> GRUB_DISK_SECTOR_BITS),
> + 0, VDEV_UBERBLOCK_RING, ub_array);
> if (err)
> - {
> - grub_dprintf ("zfs", "couldn't zio_read\n");
> - grub_errno = GRUB_ERR_NONE;
> - continue;
> - }
> + {
> + grub_dprintf ("zfs", "error reading uberblock ring for label
> %d\n", label);
> + grub_errno = GRUB_ERR_NONE;
> + continue;
> + }
>
> - if (ospsize < OBJSET_PHYS_SIZE_V14)
> - {
> - grub_dprintf ("zfs", "osp too small\n");
> - grub_free (osp);
> - continue;
> - }
> - grub_dprintf ("zfs", "ubbest %p\n", ubbest);
> + ubcur = find_bestub (ub_array, data);
> + if (!ubcur)
> + {
> + grub_dprintf ("zfs", "No good uberblocks found in label %d\n",
> label);
> + grub_errno = GRUB_ERR_NONE;
> + continue;
> + }
>
> - err = check_pool_label (data);
> - if (err)
> + if (vdev_uberblock_compare (ubcur, ubbest) > 0)
> + {
> + // Looks like the block is good, so use it.
> + grub_memcpy (ubbest, ubcur, sizeof (*ubbest));
> + bestlabel = label;
> + grub_dprintf ("zfs", "Current best uberblock found in label
> %d\n", label);
> + }
> + }
> + grub_free (ub_array);
> +
> + // We zero'd the structure to begin with. If we never assigned to it,
> + // magic will still be zero.
> + if (!ubbest->ub_magic)
> + {
> + grub_error (GRUB_ERR_BAD_FS, "couldn't find a valid ZFS label");
> + zfs_unmount (data);
> + grub_free (ubbest);
> + return 0;
> + }
> +
> + grub_dprintf ("zfs", "ubbest %p in label %d\n", ubbest, bestlabel);
> +
> + grub_zfs_endian_t ub_endian =
> + grub_zfs_to_cpu64 (ubbest->ub_magic, LITTLE_ENDIAN) == UBERBLOCK_MAGIC
> + ? LITTLE_ENDIAN : BIG_ENDIAN;
> + err = zio_read (&ubbest->ub_rootbp, ub_endian, &osp, &ospsize, data);
> +
> + if (err)
> {
> - grub_errno = GRUB_ERR_NONE;
> - continue;
> + grub_error (GRUB_ERR_BAD_FS, "couldn't zio_read object directory");
> + zfs_unmount (data);
> + grub_free (ubbest);
> + return 0;
> }
> -#if 0
> - if (find_best_root &&
> - vdev_uberblock_compare (&ubbest->ubp_uberblock,
> - &(current_uberblock)) <= 0)
> - continue;
> -#endif
> - /* Got the MOS. Save it at the memory addr MOS. */
> - grub_memmove (&(data->mos.dn), &((objset_phys_t *) osp)->os_meta_dnode,
> - DNODE_SIZE);
> - data->mos.endian = (grub_zfs_to_cpu64
> (ubbest->ubp_uberblock.ub_rootbp.blk_prop, ub_endian) >> 63) & 1;
> - grub_memmove (&(data->current_uberblock),
> - &ubbest->ubp_uberblock, sizeof (uberblock_t));
> - grub_free (ub_array);
> - grub_free (bh);
> - grub_free (osp);
> - return data;
> +
> + if (ospsize < OBJSET_PHYS_SIZE_V14)
> + {
> + grub_error (GRUB_ERR_BAD_FS, "osp too small");
> + zfs_unmount (data);
> + grub_free (osp);
> + grub_free (ubbest);
> + return 0;
> }
> - grub_error (GRUB_ERR_BAD_FS, "couldn't find a valid label");
> - zfs_unmount (data);
> - grub_free (ub_array);
> - grub_free (bh);
> - grub_free (osp);
>
> - return 0;
> + /* Got the MOS. Save it at the memory addr MOS. */
> + grub_memmove (&(data->mos.dn), &((objset_phys_t *) osp)->os_meta_dnode,
> DNODE_SIZE);
> + data->mos.endian =
> + (grub_zfs_to_cpu64 (ubbest->ub_rootbp.blk_prop, ub_endian) >>
> 63) & 1;
> + grub_memmove (&(data->current_uberblock), ubbest, sizeof (uberblock_t));
> +
> + grub_free (osp);
> + grub_free (ubbest);
> +
> + return (data);
> }
>
> grub_err_t
> @@ -2149,33 +2207,18 @@ zfs_label (grub_device_t device, char **label)
> static grub_err_t
> zfs_uuid (grub_device_t device, char **uuid)
> {
> - char *nvlist;
> - int found;
> struct grub_zfs_data *data;
> - grub_uint64_t guid;
> - grub_err_t err;
> -
> - *uuid = 0;
>
> data = zfs_mount (device);
> if (! data)
> return grub_errno;
>
> - err = zfs_fetch_nvlist (data, &nvlist);
> - if (err)
> - {
> - zfs_unmount (data);
> - return err;
> - }
> -
> - found = grub_zfs_nvlist_lookup_uint64 (nvlist, ZPOOL_CONFIG_POOL_GUID,
> &guid);
> - if (! found)
> - return grub_errno;
> - grub_free (nvlist);
> - *uuid = grub_xasprintf ("%016llx", (long long unsigned) guid);
> + *uuid = grub_xasprintf ("%016llx", (long long unsigned) data->pool_guid);
> zfs_unmount (data);
> +
> if (! *uuid)
> return grub_errno;
> +
> return GRUB_ERR_NONE;
> }
>
> diff --git a/include/grub/zfs/uberblock_impl.h
> b/include/grub/zfs/uberblock_impl.h
> index 1bf7f2b..94b5ad7 100644
> --- a/include/grub/zfs/uberblock_impl.h
> +++ b/include/grub/zfs/uberblock_impl.h
> @@ -23,6 +23,8 @@
> #ifndef _SYS_UBERBLOCK_IMPL_H
> #define _SYS_UBERBLOCK_IMPL_H
>
> +#define UBMAX(a,b) ((a) > (b) ? (a) : (b))
> +
> /*
> * The uberblock version is incremented whenever an incompatible on-disk
> * format change is made to the SPA, DMU, or ZAP.
> @@ -45,16 +47,10 @@ typedef struct uberblock {
> blkptr_t ub_rootbp; /* MOS objset_phys_t */
> } uberblock_t;
>
> -#define UBERBLOCK_SIZE (1ULL << UBERBLOCK_SHIFT)
> -#define VDEV_UBERBLOCK_SHIFT UBERBLOCK_SHIFT
> -
> -/* XXX Uberblock_phys_t is no longer in the kernel zfs */
> -typedef struct uberblock_phys {
> - uberblock_t ubp_uberblock;
> - char ubp_pad[UBERBLOCK_SIZE - sizeof (uberblock_t) -
> - sizeof (zio_eck_t)];
> - zio_eck_t ubp_zec;
> -} uberblock_phys_t;
> -
> +#define VDEV_UBERBLOCK_SHIFT(as) UBMAX(as, UBERBLOCK_SHIFT)
> +#define UBERBLOCK_SIZE(as) (1ULL <<
> VDEV_UBERBLOCK_SHIFT(as))
> +
> +// Number of uberblocks that can fit in the ring at a given ashift
> +#define UBERBLOCK_COUNT(as) (VDEV_UBERBLOCK_RING >> VDEV_UBERBLOCK_SHIFT(as))
>
> #endif /* _SYS_UBERBLOCK_IMPL_H */
>
>
> _______________________________________________
> Grub-devel mailing list
> address@hidden
> https://lists.gnu.org/mailman/listinfo/grub-devel
--
Regards
Vladimir 'φ-coder/phcoder' Serbinenko
signature.asc
Description: OpenPGP digital signature
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: [Patch] Robustly search for ZFS labels & uberblocks,
Vladimir 'φ-coder/phcoder' Serbinenko <=