grub-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/9] btrfs: Add support for reading a filesystem with a RAID


From: Daniel Kiper
Subject: Re: [PATCH 1/9] btrfs: Add support for reading a filesystem with a RAID 5 or RAID 6 profile.
Date: Mon, 3 Sep 2018 14:52:45 +0200
User-agent: Mutt/1.3.28i

On Wed, Jul 18, 2018 at 08:24:50AM +0200, Goffredo Baroncelli wrote:
> On 07/12/2018 03:46 PM, Daniel Kiper wrote:
> > On Tue, Jun 19, 2018 at 07:39:48PM +0200, Goffredo Baroncelli wrote:
> >> Signed-off-by: Goffredo Baroncelli <address@hidden>
> >> ---
> >>  grub-core/fs/btrfs.c | 70 ++++++++++++++++++++++++++++++++++++++++++++
> >>  1 file changed, 70 insertions(+)
> >>
> >> diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c
> >> index be195448d..fbabaebbe 100644
> >> --- a/grub-core/fs/btrfs.c
> >> +++ b/grub-core/fs/btrfs.c
> >> @@ -119,6 +119,8 @@ struct grub_btrfs_chunk_item
> >>  #define GRUB_BTRFS_CHUNK_TYPE_RAID1         0x10
> >>  #define GRUB_BTRFS_CHUNK_TYPE_DUPLICATED    0x20
> >>  #define GRUB_BTRFS_CHUNK_TYPE_RAID10        0x40
> >> +#define GRUB_BTRFS_CHUNK_TYPE_RAID5         0x80
> >> +#define GRUB_BTRFS_CHUNK_TYPE_RAID6         0x100
> >>    grub_uint8_t dummy2[0xc];
> >>    grub_uint16_t nstripes;
> >>    grub_uint16_t nsubstripes;
> >> @@ -764,6 +766,74 @@ grub_btrfs_read_logical (struct grub_btrfs_data 
> >> *data, grub_disk_addr_t addr,
> >>          stripe_offset = low + chunk_stripe_length
> >>            * high;
> >>          csize = chunk_stripe_length - low;
> >> +        break;
> >> +      }
> >> +    case GRUB_BTRFS_CHUNK_TYPE_RAID5:
> >> +    case GRUB_BTRFS_CHUNK_TYPE_RAID6:
> >> +      {
> >> +        grub_uint64_t nparities, stripe_nr, high, low;
> >> +
> >> +        redundancy = 1;   /* no redundancy for now */
> >> +
> >> +        if (grub_le_to_cpu64 (chunk->type) & GRUB_BTRFS_CHUNK_TYPE_RAID5)
> >> +          {
> >> +            grub_dprintf ("btrfs", "RAID5\n");
> >> +            nparities = 1;
> >> +          }
> >> +        else
> >> +          {
> >> +            grub_dprintf ("btrfs", "RAID6\n");
> >> +            nparities = 2;
> >> +          }
> >> +
> >> +        /*
> >> +         * Below is an example of a RAID 6 layout and the meaning of the
> >> +         * variables. The same applies to RAID 5. The only differences is
> >> +         * that there is only one parity disk instead of two.
> >> +         *
> >> +         * A RAID 6 layout consists of several stripes spread
> >> +         * on the disks, following a layout like the one below
> >> +         *
> >> +         *   Disk0  Disk1  Disk2  Ddisk3
> >
> > s/Ddisk3/Disk3/
> >
> >> +         *
> >> +         *    A1     B1     P1     Q1
> >> +         *    Q2     A2     B2     P2
> >> +         *    P3     Q3     A3     B3
> >> +         *  [...]
> >> +         *
> >> +         *  Note that the placement of the parities depends on row index.
> >> +         *  In the code below:
> >> +         *  - stripe_nr is the stripe number not considering the parities
> >> +         *    (A1 = 0, B1 = 1, A2 = 2, B2 = 3, ...),
> >> +         *  - high is the row number (0 for A1...Q1, 1 for Q2...P2, ...),
> >> +         *  - stripen is the column number (or disk number),
> >> +         *  - off is the logical address to read (from the beginning of
> >> +         *    the chunk space),
> >
> > What do you mean by chunk?
>
> Is a specific btrfs data structure (see
> https://btrfs.wiki.kernel.org/index.php/Manpage/mkfs.btrfs#BLOCK_GROUPS.2C_CHUNKS.2C_RAID).
> Most of the BTRFS internal structures (e.g. where the file data is
> stored) are in terms of *logical address*. At this level there is no
> concept of redundancy or raid nor disks. You can see the logical
> address as a flat space, large as the sum of the disks sizes (minus
> the space used by the parities or mirroring).
>
> The "logical space" is divided in "slices" called chunk. A chunk
> typically has a size in terms of hundred of megabytes (IIRC up to 1GB)
>
> A chunk (or block group) is a mapping. A chunk maps a "logical
> address" to a "physical address" on the disks.
>
> You can think a chunk as a structure like:
>
>       u64 profile
>       u64 logical_start, logical_end
>       u64 disk1_id, disk1_start, disk1_end
>       u64 disk2_id, disk2_start, disk2_end
>       u64 disk3_id, disk3_start, disk3_end
>       [...]
>
> In order to translate a "logical_address" to the pair of "disk" and "physical 
> address:
> - find the chunk which maps the logical address (looking to the 
> "logical_start", "logical_end")
> - then on the basis of "profile" value you can have the following cases:
>       profile == SINGLE:
>               is the simplest one: the logical address is mapped to the range
>               (disk1_start...disk1_end) of the disk "disk_id1"
>
>       profile == RAID1
>               like the previous one; however if the disk1 is not available,
>               you can find the data in the range
>               (disk2_start...disk2_end) of the disk "disk_id2". NOTE: the two 
> ranges
>               must have the same sizes, but may have different starts
>
>       profile == RAID0
>               the data is without redundancy and it is spread on different 
> disk each
>               "chunk_stripe_length" bytes; so:
>                       off = logical_address - logical_start // this is what I 
> call
>                                                             // address from 
> the
>                                                             // beginning of 
> the chunk
>
>
>
>
>
>
>       and so on....
>
>
>
> >
> >> +         *  - chunk_stripe_length is the size of a stripe (typically 64k),
> >> +         *  - nstripes is the number of disks,
> >> +         *  - low is the offset of the data inside a stripe,
> >> +         *  - stripe_offset is the disk offset from the beginning
> >> +         *    of the disk chunk mapping start,
> >
> > Err... I am confused here...
>
> As described above, the chunk data structure translate from "logical
> address" to the pair (disk, physical address on the disk). Each chunk
> translate a small portion of the space (typically 1GB); so it is more
> correct to say that a chunk maps a range in the logical address to a
> range in the disks. 'off' is the offset from the beginning of the
> range in the logical address. 'stripe_offset' is the offset from the
> beginning of the range in the physical space
>
> >> +         *  - csize is the "potential" data to read. It will be reduced to
> >> +         *    size if the latter is smaller.
> >> +         */
> >> +        stripe_nr = grub_divmod64 (off, chunk_stripe_length, &low);
> >
> > Here it seems to me that off is from the beginning of RAID volume space.
> > So, it does not agree with the comment above...
>
> RAID volume space == chunk logical address space
>
> >> +        /*
> >> +         * stripen is evaluated without considering
> >> +         * the parities (0 for A1, A2, A3... 1 for B1, B2...).
> >> +         */
> >> +        high = grub_divmod64 (stripe_nr, nstripes - nparities, &stripen);
> >
> > This looks good.
> >
> >> +        /*
> >> +         * stripen now considers also the parities (0 for A1, 1 for A2,
> >> +         * 2 for A3....). The math is performed modulo number of disks.
> >> +         */
> >> +        grub_divmod64 (high + stripen, nstripes, &stripen);
> >
> > Ditto.
> >
> >> +        stripe_offset = low + chunk_stripe_length * high;
> >
> > And here I am confused. Why "* high"? If chunk_stripe_length is stripe
> > length then stripe_offset leads to nowhere. Am I missing something?
> > I have a feeling that comments does not agree with the code somehow.
>
> The "physical range" is divided in small range called(?) "stripe".
> These stripe are spread on the disks. 'high' is the offset from the
> beginning of the physical chunk range in unit of "chunk_stripe_length"
>
> I found quite confuse the btrfs terminology around the "stripe" word.
> And I have to point out that these variables were named before my
> patch.
>
> I am afraid that my comment doesn't help to "de-obfuscate" my code. I

It does. Please extend the code comment accordingly. You can add a link
to the BTRFS Wiki page. Though please try to use terms only from BRFS
specs/docs. Do not introduce new ones if it is not really needed. This
whole thing will ease reading new and old code.

> know that my English is far to be perfect. However this code is not so

Do not worry about that. Many people here learn English. Just keep doing it.

> different that the previous one (i.e. handling the btrfs
> raid1/dup/raid0....). So I am thinking to remove all the comments
> because it seems to me that the my comments increase the confusion
> instead of reducing it.

No, please do not do that. We needed it. If you take into account my
comments then everything should be OK.

Anyway, the code starts looking good. I think that we need 1-3
iterations and I will be able to get it into GRUB2 git repo.
So, please keep working on this feature.

Daniel



reply via email to

[Prev in Thread] Current Thread [Next in Thread]