grub-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 1/9] btrfs: Add support for reading a filesystem with a RAID


From: Goffredo Baroncelli
Subject: Re: [PATCH 1/9] btrfs: Add support for reading a filesystem with a RAID 5 or RAID 6 profile.
Date: Wed, 26 Sep 2018 22:40:32 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.0

On 25/09/2018 17.31, Daniel Kiper wrote:
> On Wed, Sep 19, 2018 at 08:40:32PM +0200, Goffredo Baroncelli wrote:
>> From: Goffredo Baroncelli <address@hidden>
>>
>> Signed-off-by: Goffredo Baroncelli <address@hidden>
>> ---
>>  grub-core/fs/btrfs.c | 66 ++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 66 insertions(+)
>>
>> diff --git a/grub-core/fs/btrfs.c b/grub-core/fs/btrfs.c
>> index be195448d..56c42746d 100644
>> --- a/grub-core/fs/btrfs.c
>> +++ b/grub-core/fs/btrfs.c
>> @@ -119,6 +119,8 @@ struct grub_btrfs_chunk_item
>>  #define GRUB_BTRFS_CHUNK_TYPE_RAID1         0x10
>>  #define GRUB_BTRFS_CHUNK_TYPE_DUPLICATED    0x20
>>  #define GRUB_BTRFS_CHUNK_TYPE_RAID10        0x40
>> +#define GRUB_BTRFS_CHUNK_TYPE_RAID5         0x80
>> +#define GRUB_BTRFS_CHUNK_TYPE_RAID6         0x100
>>    grub_uint8_t dummy2[0xc];
>>    grub_uint16_t nstripes;
>>    grub_uint16_t nsubstripes;
>> @@ -764,6 +766,70 @@ grub_btrfs_read_logical (struct grub_btrfs_data *data, 
>> grub_disk_addr_t addr,
>>            stripe_offset = low + chunk_stripe_length
>>              * high;
>>            csize = chunk_stripe_length - low;
>> +          break;
>> +        }
>> +      case GRUB_BTRFS_CHUNK_TYPE_RAID5:
>> +      case GRUB_BTRFS_CHUNK_TYPE_RAID6:
>> +        {
>> +          grub_uint64_t nparities, block_nr, high, low;
>> +
>> +          redundancy = 1;   /* no redundancy for now */
>> +
>> +          if (grub_le_to_cpu64 (chunk->type) & GRUB_BTRFS_CHUNK_TYPE_RAID5)
>> +            {
>> +              grub_dprintf ("btrfs", "RAID5\n");
>> +              nparities = 1;
>> +            }
>> +          else
>> +            {
>> +              grub_dprintf ("btrfs", "RAID6\n");
>> +              nparities = 2;
>> +            }
>> +
>> +          /*
>> +           * A RAID 6 layout consists of several blocks spread on the disks.
>> +           * The raid terminology is used to call all the blocks of a row
>> +           * "stripe". Unfortunately the BTRFS terminology confuses block
> 
> Stripe is data set or parity (parity stripe) on one disk. Block has
> different meaning. Please stick to btrfs terminology and say it clearly
> in the comment. And even add a link to btrfs wiki page to ease reading.
> 
> I think about this one:
>   
> https://btrfs.wiki.kernel.org/index.php/Manpage/mkfs.btrfs#BLOCK_GROUPS.2C_CHUNKS.2C_RAID
> 
>> +           * and stripe.
> 
> I do not think so. Or at least not so much...

Trust me, generally speaking stripe is the "row" in the disks (without the 
parity); looking at the ext3 man page:

....
                   stride=stride-size
                          Configure  the  filesystem  for  a  RAID  array with
                          stride-size filesystem blocks. This is the number of
                          blocks  read or written to disk before moving to the
                          next disk, which is sometimes  referred  to  as  the
                          chunk   size.   This  mostly  affects  placement  of
                          filesystem metadata like bitmaps at mke2fs  time  to
                          avoid  placing them on a single disk, which can hurt
                          performance.  It may also be used by the block allo‐
                          cator.

                   stripe_width=stripe-width
                          Configure  the  filesystem  for  a  RAID  array with
                          stripe-width filesystem blocks per stripe.  This  is
                          typically  stride-size * N, where N is the number of
                          data-bearing disks in the  RAID  (e.g.  for  RAID  5
                          there is one parity disk, so N will be the number of
                          disks in the array minus 1).  This allows the  block
                          allocator to prevent read-modify-write of the parity
                          in a RAID stripe if possible when the data is  writ‐
                          ten.

....
Looking at the RAID5 wikipedia page, it seems that the term "stripe" is 
coherent with the ext3 man page.

I suspect that the confusion is due to the fact that in RAID1 a stripe is in 
ONE disk (because the others are like parities). In BTRFS the RAID5/6 code uses 
the structure of RAID1 with some minimal extensions...

To be clear, I don't have problem to be adherent to the BTRFS terminology. 
However I found this very confusing because I was used to a different 
terminology. I am bit worried about the fact that grub uses both MD/DM code and 
BTRFS code; a quick look to the code (eg ./grub-core/disk/diskfilter.c) shows 
that the stripe_size field seems to be related to a disks row without parities.

And... yes in BTRFS "chunk" is a completely different beast than what it is 
reported in the ext3 man page :-)


> 
>> +           *
>> +           *   Disk0  Disk1  Disk2  Disk3
>> +           *
>> +           *    A1     B1     P1     Q1
>> +           *    Q2     A2     B2     P2
>> +           *    P3     Q3     A3     B3
>> +           *  [...]
>> +           *
>> +           *  Note that the placement of the parities depends on row index.
>> +           *  In the code below:
>> +           *  - block_nr is the block number without the parities
> 
> Well, it seems to me that the btrfs code introduces confusion not the
> spec itself. I would leave code as is but s/block number/stripe number/.

Ok I will replace the two terms. However I have to put a warning that this is a 
"BTRFS" terminology :-)
> 
>> +           *    (A1 = 0, B1 = 1, A2 = 2, B2 = 3, ...),
>> +           *  - high is the row number (0 for A1...Q1, 1 for Q2...P2, ...),
>> +           *  - stripen is the disk number (0 for A1,Q2,P3, 1 for B1...),
> 
> s/disk number/disk number in a row/

This value doesn't depend by the row. So "number of disk" is more correct
> 
>> +           *  - off is the logical address to read
>> +           *  - chunk_stripe_length is the size of a block (typically 64k),
> 
> s/a block/a stripe/
> 
>> +           *  - nstripes is the number of disks,
> 
> s/number of disks/number of disks in a row/
ditto


> 
> I miss the description of nparities here...

Right:
+              *  - nparities is the number of parities (1 for RAID5, 2 for 
RAID6);
+              *    used only in RAID5/6 code.

> 
>> +           *  - low is the offset of the data inside a stripe,
>> +           *  - stripe_offset is the disk offset,
> 
> s/the disk offset/the data offset in an array/?

Yes

> 
>> +           *  - csize is the "potential" data to read. It will be reduced to
>> +           *    size if the latter is smaller.
>> +           */
>> +          block_nr = grub_divmod64 (off, chunk_stripe_length, &low);
>> +
>> +          /*
>> +           * stripen is computed without the parities (0 for A1, A2, A3...
>> +           * 1 for B1, B2...).
>> +           */
>> +          high = grub_divmod64 (block_nr, nstripes - nparities, &stripen);
> 
> This is clear...
> 
>> +          /*
>> +           * stripen is recomputed considering the parities (0 for A1, 1 for
>> +           * A2, 2 for A3....).
>> +           */
>> +          grub_divmod64 (high + stripen, nstripes, &stripen);
> 
> ... but this looks strange... You add disk number to row number. Hmmm...
> It looks that it works but this is not obvious at first sight. Could you
> explain that?

What about
+             /*
+              * stripen is recomputed considering the parities: different row 
have
+              * a different offset, we have to add to stripen the number of 
row ("high") in 
+              * modulo nstripes (0 for A1, 1 for A2, 2 for A3....).
+              */

> 
>> +          stripe_offset = low + chunk_stripe_length * high;
>> +          csize = chunk_stripe_length - low;
>> +
>>            break;
>>          }
>>        default:
> 
> Daniel
> 


-- 
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5



reply via email to

[Prev in Thread] Current Thread [Next in Thread]