grub-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: grub mishandles corrupt/missing primary GPT


From: Lennart Sorensen
Subject: Re: grub mishandles corrupt/missing primary GPT
Date: Thu, 24 Oct 2013 09:39:02 -0400
User-agent: Mutt/1.5.20 (2009-06-14)

On Wed, Oct 23, 2013 at 09:07:21PM -0600, Chris Murphy wrote:
> While technically a violation of the UEFI spec, I think this can be worked 
> around by considering the disk GPT if the first entry in the MBR is type 
> 0xEE. I don't know of a hybrid MBR implementation where an entry other than 
> the first is 0xEE. 

Well everyone other than Microsoft seems to understand how useful support
for hybrid setups can be and hence support them.

> But if there is no 0xEE entry at all, this is identical to a formerly GPT 
> disk repartitioned as MBR by a utility that doesn't know anything about GPT, 
> and thus doesn't erase the stale GPT data - and therefore must be treated as 
> MBR.

That is true.  That does not mean there must ONLY be a 0xEE entry.

> So perhaps this test is difficult because it's GPT on BIOS, with a limited 
> space BIOS boot partition. However, I think on UEFI computers this should 
> still work with one valid GPT, rather than not boot at all. There's a lot 
> more space for this there.

Certainly if using the BIOS boot partition, there really isn't much of
a space excuse anymore, unless you run into limitations on how much ram
you can use in early boot.

> Both primary and backup GPTs are preserved in this case since the primary is 
> in LBA 1 and 2, and only LBA 0 is overwritten with the new MBR.
> 
> UEFI spec says if the MBR signature of 0xaa55 is intact, and there isn't an 
> 0xEE entry, and the partition entries are rational (physically on disk and 
> don't overlap), then the two GPTs are considered stale and the disk is MBR.
> 
> The primary header contains the location of the backup GPT. If the header is 
> sufficiently corrupt, and the backup GPT can't be located, then that's the 
> same as an invalid backup GPT, and in that case fail.
> 
> My point is we shouldn't fail when there is a valid locatable backup GPT. The 
> whole point of having a second GPT is obviated with the current behavior.

Sometimes backups are designed in and never used.  I don't recall ever
seeing any indication Microsoft ever used the second copy of the FAT
for anything other than filesystem repair tools.

> I don't think we can work around this kind of hardware vendor sabotage. If it 
> looks like a valid GPT, but is actually stale, if it's used and contains 
> incorrect information, then boot fails. Better to try than not try at all.
> 
> It's certainly uncommon. A Google search: corrupt "primary gpt" only turns up 
> 1900 results. But it is possible.
> 
> And this isn't the only mishandling I'm finding, so it's not like GRUB is 
> unique. In fact just now by changing only a single byte in the primary GPT 
> table (I changed the E to an F in the BIOS boot partition type UUID), the 
> kernel suddenly has no idea what disklabel the disk is, and fails to mount 
> rootfs. So I need to track that down too, but it seems like it knows the 
> primary GPT table is corrupt, but then fails to use the backup GPT for some 
> reason.
> 
> An argument against GRUB doing all of this work: maybe the bootloader should 
> be able to blindly trust the primary GPT table with no validity checks? And 
> instead rely on (presently non-existent) checks by the underlying OS to fixi 
> this problem? Something like an fsck_gpt, seeing as nothing else is in a good 
> position to both check and fix such GPTs other than a partition tool.

Perhaps.  Certainly simpler.

I do wonder how Windows handles booting with a corrupt primary GPT.
Would you happen to know? (A quick google search didn't find an answer
to the question unfortunately).

> The UEFI spec says "Software should ask a user for confirmation before 
> restoring the primary GPT" and yet it also requires the unspecified software 
> fix the primary GPT if corrupt. The spec actually uses the word "must". So 
> per usual, the spec has rather lofty demands.

So it must fix it after asking the user for confirmation?

-- 
Len Sorensen



reply via email to

[Prev in Thread] Current Thread [Next in Thread]