grub-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bug#478238: grub-probe: fails to find drive for /dev/sda10


From: Török Edwin
Subject: Re: Bug#478238: grub-probe: fails to find drive for /dev/sda10
Date: Sun, 11 May 2008 14:35:41 +0300
User-agent: Mozilla-Thunderbird 2.0.0.12 (X11/20080420)

[sending to grub-devel@ as requested]

Robert Millan wrote:
> On Sun, May 04, 2008 at 05:01:32PM +0300, Török Edwin wrote:
>   
>>>>    Device Boot      Start         End      Blocks   Id  System
>>>> /dev/sda1   *           1        1275    10241406    7  HPFS/NTFS
>>>> /dev/sda2            1276        2248     7815622+  a6  OpenBSD
>>>> /dev/sda3            2249        5289    24426832+   f  W95 Ext'd (LBA)
>>>> /dev/sda4            6080        7296     9775552+  bf  Solaris
>>>> /dev/sda5            2249        2371      987966   82  Linux swap / 
>>>> Solaris
>>>> /dev/sda6            2372        3587     9767488+  83  Linux
>>>> /dev/sda7            3588        3600      104391   83  Linux
>>>> /dev/sda8            3601        4863    10145016   8e  Linux LVM
>>>> /dev/sda9            4864        5228     2931831   a6  OpenBSD
>>>> /dev/sda10           5229        5289      489951   83  Linux
>>>>         
>> [...]
>> grub> ls (hd0,10)
>> error: unknown device
>> grub> ls (hd0,11)
>> error: unknown device
>> grub>
>>     
>
> I tried reproducing your setup, but I can't hit the same bug.  This starts to
> look really nasty.  Just spotted this:
>
>   /build/buildd/grub2-1.96+20080426/partmap/pc.c:141: partition 0: flag 0x80, 
> type 0x7, start 0x3f, len 0x1388afc
>   [...]
>   /build/buildd/grub2-1.96+20080426/partmap/pc.c:141: partition 0: flag 0x0, 
> type 0x82, start 0x2270f07, len 0x1e267c
>
> for which I can't find any explanation other than memory corruption.  Also,
> due to a missing fflush() call the output is somewhat scrambled, which makes
> it harder to track (I fixed this already in upstream).
>
> Could you:
>
>   - Apply the attached patch & run grub-probe again (this time output
>     will be a bit more readable)
>   

There was no patch attached, however I did a 'cvs diff -u -D2008-04-30',
and applied that patch.
I found what the problem is, and it also explains why you couldn't
reproduce the problem.

/dev/sda9 is not a valid OpenBSD partition, and in partmap/pc.c:176 the
iteration fails with an error: invalid disk label magic 0x%x.
If I replace that return with a continue, it works.

The problem is that grub2 stops looking for more partitions as soon as
it encountered the invalid partition,
grub 0.97 was working perfectly and I never noticed the partition has
the wrong type!

Also if I change the partition type to 83 (as it should be) an unpatched
grub-probe can find that /boot is on /dev/sda10:
# grub-probe -t device /boot
/dev/sda10

I think grub2 should handle errors more gracefully, eventually mark the
partition as invalid, and keep going.
grub-probe was looking for /dev/sda10, and it shouldn't be affected by
/dev/sda9 being corrupted/invalid.
Think of it this way: if a partition gets corrupted, that shouldn't
prevent from booting, assuming the boot and root partitions are
still ok.

Compare what grub-emu says when sda9 has wrong type:

grub> ls (hd0,10)
error: unknown device

And this is what it says when sda9 has the correct type:
grub> ls (hd0,10)
      Partition hd0,10: Filesystem type ext2, Label debian_BOOT



>   - Send it to address@hidden
>   
Done
>   ?
>
> Maybe someone there has an idea, but if it's memory corruption and we can't
> reproduce it, tracing the problem remotely isn't going to work very well.
>   

It wasn't memory corruption, however I have run valgrind and it has
shown some leaks, plus call to stat() with NULL parameter.
The attached patch fixes some valgrind warnings. Some leaks still
remain, I attached the new valgrind logs.

P.S.: grub2 seems to work now, I am able to boot with it with the
text-mode menu. The default graphics mode doesn't work I will open a
separate bug about that.

Best regards,
--Edwin


diff -ur grub2-1.96+20080429/kern/disk.c ../grub2-1.96+20080429/kern/disk.c
--- grub2-1.96+20080429/kern/disk.c     2008-02-08 14:22:51.000000000 +0200
+++ ../grub2-1.96+20080429/kern/disk.c  2008-05-11 13:58:02.270673755 +0300
@@ -317,7 +317,10 @@
   /* Reset the timer.  */
   grub_last_time = grub_get_rtc ();
 
-  grub_free (disk->partition);
+  if(disk->partition) {
+         grub_free (disk->partition->data);
+         grub_free (disk->partition);
+  }
   grub_free ((void *) disk->name);
   grub_free (disk);
 }
diff -ur grub2-1.96+20080429/util/grub-probe.c 
../grub2-1.96+20080429/util/grub-probe.c
--- grub2-1.96+20080429/util/grub-probe.c       2008-05-11 13:59:14.934811935 
+0300
+++ ../grub2-1.96+20080429/util/grub-probe.c    2008-05-11 13:46:21.729236855 
+0300
@@ -190,9 +190,10 @@
       struct stat st;
       grub_fs_t fs;
 
-      stat (path, &st);
+      if(path)
+             stat (path, &st);
 
-      if (st.st_mode == S_IFREG)
+      if (path && st.st_mode == S_IFREG)
        {
          /* Regular file.  Verify that we can read it properly.  */
 

==25071== Memcheck, a memory error detector.
==25071== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et al.
==25071== Using LibVEX rev 1804, a library for dynamic binary translation.
==25071== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP.
==25071== Using valgrind-3.3.0-Debian, a dynamic binary instrumentation 
framework.
==25071== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al.
==25071== For more details, rerun with: -v
==25071== 
==25071== My PID = 25071, parent PID = 5663.  Prog and args are:
==25071==    ./grub-probe
==25071==    -d
==25071==    /dev/sda10
==25071== 
==25071== Warning: noted but unhandled ioctl 0x1261 with no size/direction hints
==25071==    This could cause spurious value errors to appear.
==25071==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a 
proper wrapper.
==25071== Warning: noted but unhandled ioctl 0x1261 with no size/direction hints
==25071==    This could cause spurious value errors to appear.
==25071==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a 
proper wrapper.
==25071== Warning: noted but unhandled ioctl 0x1261 with no size/direction hints
==25071==    This could cause spurious value errors to appear.
==25071==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a 
proper wrapper.
==25071== 
==25071== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 10 from 1)
==25071== malloc/free: in use at exit: 611,077 bytes in 176 blocks.
==25071== malloc/free: 901 allocs, 725 frees, 2,397,201 bytes allocated.
==25071== For counts of detected errors, rerun with: -v
==25071== searching for pointers to 176 not-freed blocks.
==25071== checked 662,256 bytes.
==25071== 
==25071== 4,096 bytes in 1 blocks are possibly lost in loss record 3 of 5
==25071==    at 0x4006AB8: malloc (vg_replace_malloc.c:207)
==25071==    by 0x804AFE4: xmalloc (misc.c:81)
==25071==    by 0x804B41A: grub_malloc (misc.c:222)
==25071==    by 0x804C3EB: grub_disk_cache_store (disk.c:162)
==25071==    by 0x804CDC1: grub_disk_read (disk.c:461)
==25071==    by 0x8069A72: grub_lvm_scan_device (lvm.c:288)
==25071==    by 0x804C014: iterate_partition.2134 (device.c:132)
==25071==    by 0x8066C9C: pc_partition_map_iterate (pc.c:153)
==25071==    by 0x804F3AD: grub_partition_iterate (partition.c:126)
==25071==    by 0x804C09D: iterate_disk.2131 (device.c:101)
==25071==    by 0x80498FA: call_hook (biosdisk.c:132)
==25071==    by 0x804992B: grub_util_biosdisk_iterate (biosdisk.c:141)
==25071== 
==25071== 
==25071== 41,136 (41,132 direct, 4 indirect) bytes in 12 blocks are definitely 
lost in loss record 4 of 5
==25071==    at 0x4006AB8: malloc (vg_replace_malloc.c:207)
==25071==    by 0x804AFE4: xmalloc (misc.c:81)
==25071==    by 0x804B41A: grub_malloc (misc.c:222)
==25071==    by 0x804C3EB: grub_disk_cache_store (disk.c:162)
==25071==    by 0x804CDC1: grub_disk_read (disk.c:461)
==25071==    by 0x8066D4E: pc_partition_map_iterate (pc.c:165)
==25071==    by 0x804F3AD: grub_partition_iterate (partition.c:126)
==25071==    by 0x804C09D: iterate_disk.2131 (device.c:101)
==25071==    by 0x80498FA: call_hook (biosdisk.c:132)
==25071==    by 0x804992B: grub_util_biosdisk_iterate (biosdisk.c:141)
==25071==    by 0x804C4CC: grub_disk_dev_iterate (disk.c:205)
==25071==    by 0x804BF63: grub_device_iterate (device.c:138)
==25071== 
==25071== LEAK SUMMARY:
==25071==    definitely lost: 41,132 bytes in 12 blocks.
==25071==    indirectly lost: 4 bytes in 1 blocks.
==25071==      possibly lost: 4,096 bytes in 1 blocks.
==25071==    still reachable: 565,845 bytes in 162 blocks.
==25071==         suppressed: 0 bytes in 0 blocks.
==25071== Reachable blocks (those to which a pointer was found) are not shown.
==25071== To see them, rerun with: --leak-check=full --show-reachable=yes

reply via email to

[Prev in Thread] Current Thread [Next in Thread]