grub-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH 1/9] btrfs: Add support for reading a filesystem with a RAID 5 or


From: Goffredo Baroncelli
Subject: [PATCH 1/9] btrfs: Add support for reading a filesystem with a RAID 5 or RAID 6 profile.
Date: Wed, 18 Jul 2018 08:24:50 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1

On 07/12/2018 03:46 PM, Daniel Kiper wrote:
> On Tue, Jun 19, 2018 at 07:39:48PM +0200, Goffredo Baroncelli wrote:
>> Signed-off-by: Goffredo Baroncelli <address@hidden>
>> ---
>>  grub-core/fs/btrfs.c | 70 ++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 70 insertions(+)
>>
>> diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c
>> index be195448d..fbabaebbe 100644
>> --- a/grub-core/fs/btrfs.c
>> +++ b/grub-core/fs/btrfs.c
>> @@ -119,6 +119,8 @@ struct grub_btrfs_chunk_item
>>  #define GRUB_BTRFS_CHUNK_TYPE_RAID1         0x10
>>  #define GRUB_BTRFS_CHUNK_TYPE_DUPLICATED    0x20
>>  #define GRUB_BTRFS_CHUNK_TYPE_RAID10        0x40
>> +#define GRUB_BTRFS_CHUNK_TYPE_RAID5         0x80
>> +#define GRUB_BTRFS_CHUNK_TYPE_RAID6         0x100
>>    grub_uint8_t dummy2[0xc];
>>    grub_uint16_t nstripes;
>>    grub_uint16_t nsubstripes;
>> @@ -764,6 +766,74 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, 
>> grub_disk_addr_t addr,
>>            stripe_offset = low + chunk_stripe_length
>>              * high;
>>            csize = chunk_stripe_length - low;
>> +          break;
>> +        }
>> +      case GRUB_BTRFS_CHUNK_TYPE_RAID5:
>> +      case GRUB_BTRFS_CHUNK_TYPE_RAID6:
>> +        {
>> +          grub_uint64_t nparities, stripe_nr, high, low;
>> +
>> +          redundancy = 1;   /* no redundancy for now */
>> +
>> +          if (grub_le_to_cpu64 (chunk->type) & GRUB_BTRFS_CHUNK_TYPE_RAID5)
>> +            {
>> +              grub_dprintf ("btrfs", "RAID5\n");
>> +              nparities = 1;
>> +            }
>> +          else
>> +            {
>> +              grub_dprintf ("btrfs", "RAID6\n");
>> +              nparities = 2;
>> +            }
>> +
>> +          /*
>> +           * Below is an example of a RAID 6 layout and the meaning of the
>> +           * variables. The same applies to RAID 5. The only differences is
>> +           * that there is only one parity disk instead of two.
>> +           *
>> +           * A RAID 6 layout consists of several stripes spread
>> +           * on the disks, following a layout like the one below
>> +           *
>> +           *   Disk0  Disk1  Disk2  Ddisk3
> 
> s/Ddisk3/Disk3/
> 
>> +           *
>> +           *    A1     B1     P1     Q1
>> +           *    Q2     A2     B2     P2
>> +           *    P3     Q3     A3     B3
>> +           *  [...]
>> +           *
>> +           *  Note that the placement of the parities depends on row index.
>> +           *  In the code below:
>> +           *  - stripe_nr is the stripe number not considering the parities
>> +           *    (A1 = 0, B1 = 1, A2 = 2, B2 = 3, ...),
>> +           *  - high is the row number (0 for A1...Q1, 1 for Q2...P2, ...),
>> +           *  - stripen is the column number (or disk number),
>> +           *  - off is the logical address to read (from the beginning of
>> +           *    the chunk space),
> 
> What do you mean by chunk?

Is a specific btrfs data structure (see 
https://btrfs.wiki.kernel.org/index.php/Manpage/mkfs.btrfs#BLOCK_GROUPS.2C_CHUNKS.2C_RAID).
 Most of the BTRFS internal structures (e.g. where the file data is stored) are 
in terms of *logical address*. At this level there is no concept of redundancy 
or raid nor disks. You can see the logical address as a flat space, large as 
the sum of the disks sizes (minus the space used by the parities or mirroring).

The "logical space" is divided in "slices" called chunk. A chunk typically has 
a size in terms of hundred of megabytes (IIRC up to 1GB)

A chunk (or block group) is a mapping. A chunk maps a "logical address" to a 
"physical address" on the disks.

You can think a chunk as a structure like:

        u64 profile
        u64 logical_start, logical_end
        u64 disk1_id, disk1_start, disk1_end
        u64 disk2_id, disk2_start, disk2_end
        u64 disk3_id, disk3_start, disk3_end
        [...]

In order to translate a "logical_address" to the pair of "disk" and "physical 
address:
- find the chunk which maps the logical address (looking to the 
"logical_start", "logical_end")
- then on the basis of "profile" value you can have the following cases:
        profile == SINGLE:
                is the simplest one: the logical address is mapped to the range
                (disk1_start...disk1_end) of the disk "disk_id1"

        profile == RAID1
                like the previous one; however if the disk1 is not available, 
                you can find the data in the range 
                (disk2_start...disk2_end) of the disk "disk_id2". NOTE: the two 
ranges
                must have the same sizes, but may have different starts

        profile == RAID0
                the data is without redundancy and it is spread on different 
disk each
                "chunk_stripe_length" bytes; so:
                        off = logical_address - logical_start // this is what I 
call
                                                              // address from 
the
                                                              // beginning of 
the chunk

                        
                                                             
                        
                

        and so on....



> 
>> +           *  - chunk_stripe_length is the size of a stripe (typically 64k),
>> +           *  - nstripes is the number of disks,
>> +           *  - low is the offset of the data inside a stripe,
>> +           *  - stripe_offset is the disk offset from the beginning
>> +           *    of the disk chunk mapping start,
> 
> Err... I am confused here...

As described above, the chunk data structure translate from "logical address" 
to the pair (disk, physical address on the disk). Each chunk translate a small 
portion of the space (typically 1GB); so it is more correct to say that a chunk 
maps a range in the logical address to a range in the disks. 'off' is the 
offset from the beginning of the range in the logical address. 'stripe_offset' 
is the offset from the beginning of the range in the physical space



> 
>> +           *  - csize is the "potential" data to read. It will be reduced to
>> +           *    size if the latter is smaller.
>> +           */
>> +          stripe_nr = grub_divmod64 (off, chunk_stripe_length, &low);
> 
> Here it seems to me that off is from the beginning of RAID volume space.
> So, it does not agree with the comment above...

RAID volume space == chunk logical address space

> 
>> +          /*
>> +           * stripen is evaluated without considering
>> +           * the parities (0 for A1, A2, A3... 1 for B1, B2...).
>> +           */
>> +          high = grub_divmod64 (stripe_nr, nstripes - nparities, &stripen);
> 
> This looks good.
> 
>> +          /*
>> +           * stripen now considers also the parities (0 for A1, 1 for A2,
>> +           * 2 for A3....). The math is performed modulo number of disks.
>> +           */
>> +          grub_divmod64 (high + stripen, nstripes, &stripen);
> 
> Ditto.
> 
>> +          stripe_offset = low + chunk_stripe_length * high;
> 
> And here I am confused. Why "* high"? If chunk_stripe_length is stripe
> length then stripe_offset leads to nowhere. Am I missing something?
> I have a feeling that comments does not agree with the code somehow.

The "physical range" is divided in small range called(?) "stripe". These stripe 
are spread on the disks. 'high' is the offset from the beginning of the 
physical chunk range in unit of "chunk_stripe_length"

I found quite confuse the btrfs terminology around the "stripe" word. And I 
have to point out that these variables were named before my patch.

I am afraid that my comment doesn't help to "de-obfuscate" my code. I know that 
my English is far to be perfect. However this code is not so different that the 
previous one (i.e. handling the btrfs raid1/dup/raid0....). So I am thinking to 
remove all the comments because it seems to me that the my comments increase 
the confusion instead of reducing it.

Please, let me know your opinion


> 
> Daniel
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5



reply via email to

[Prev in Thread] Current Thread [Next in Thread]