[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Strange behavior with grub (and disk failure)
From: |
Laurent Michel |
Subject: |
Strange behavior with grub (and disk failure) |
Date: |
06 Jul 2001 12:24:02 -0400 |
User-agent: |
Gnus/5.0808 (Gnus v5.8.8) Emacs/20.7 |
Hi!
I am having some troubles installing grub on my HD. I think I ran
into bugs. I ran grub in gdb to see what is going on and I collected
quite a bit of information.
First, here is the system description:
Hardware: MB: Abit KT7-Raid
CPU: Athlon TB 1.1Ghz
RAM: 256
kernel: 2.4.4
The machine has 3 HD, 2 of them on ide0 (hda/hdb). The system normally
boots from hda. The third disk is on an ATARAID interface, i.e., hde
on ide2.
The linux root partition is on hde1
I did setup grub on a floppy first, and this works great. Right now,
I am trying to install it on the hardrive.
Here is the partition layout
hda(hda1) : W2K
hdb(hdb1,hdb5): hdb1: reiserfs, hdb5 ext2
hd2(hde1,hde2): hde1: reiserfs, hde2 reiserfs
Here is how I tried to setup grub:
root(hd2,0) ;; root fs is on hde1
setup(hd0) ;; install grub in MBR of hda
When I run grub, I can a disk write error as this output shows:
Checking if "/boot/grub/stage1" exists... yes
Checking if "/boot/grub/stage2" exists... yes
Checking if "/boot/grub/reiserfs_stage1_5" exists... yes
Running "embed /boot/grub/reiserfs_stage1_5 (hd0)"... 18 sectors are embedded
.
succeeded
Running "install /boot/grub/stage1 d (hd0) (hd0)1+18 p (hd2,0)/boot/grub/stage
2"... failed
Error 29: Disk write error
I then went into gdb to debug it out. The failure happens when the
system tries to write to hde1
Here is the stack trace:
#0 write_to_partition (map=0x806b150, drive=130, partition=65535,
sector=1472473, size=1, buf=0x401dce00 "êp\202") at device.c:589
#1 0x805451f in install_func (arg=0x401ccd78 "/boot/grub/stage1 d (hd0)
(hd0)1+18 p (hd2,0)/boot/grub/stage2", flags=1)
at builtins.c:1919
#2 0x8055f04 in setup_func (arg=0x4017bc4e "(hd0)", flags=1) at builtins.c:3581
#3 0x80566b5 in enter_cmdline (heap=0x4017bc48 "setup (hd0)", forever=1) at
cmdline.c:168
#4 0x80527ca in cmain () at stage2.c:907
#5 0x804a606 in init_bios_info () at common.c:282
#6 0x80495da in doit () at asmstub.c:120
#7 0x80497d2 in grub_stage2 () at asmstub.c:176
#8 0x8049596 in main (argc=1, argv=0xbffffc2c) at main.c:238
#9 0x4007bbcc in __libc_start_main () from /lib/libc.so.6
I stepped into write_partition to see what was going on. I stumbled
upon this piece of code:
577 #else
578 {
579 off_t offset = (off_t) sector * (off_t) SECTOR_SIZE;
580
581 if (lseek (fd, offset, SEEK_SET) != offset)
582 {
583 errnum = ERR_DEV_VALUES;
584 return 0;
(gdb) l
585 }
586 }
587 #endif
588
589 if (write (fd, buf, size * SECTOR_SIZE) != (size * SECTOR_SIZE))
590 {
591 close (fd);
592 errnum = ERR_WRITE;
593 return 0;
594 }
The routine is acting funny right away.
At the call site (of write_to_partition) the value passed in is
saved_sector - part_start
(gdb) p saved_sector - part_start
$3 = 1472473
And this is consistent with what we see on the stack.
However, inside the routine, the first thing I see is:
(gdb) p sector
$1 = 65535
Even, if I assume that the debugger is somehow confused,
The interesting parts starts at line 579. sector is an int(4 bytes)
and is equal to
(gdb) p sector
$16 = 32768
(gdb) whatis sector
type = int
(gdb) p sizeof(sector)
$17 = 4
SECTOR_SIZE is #define'd to 0x200 (512 block size)
So, the next exerpt from gdb is interseting:
579 off_t offset = (off_t) sector * (off_t) SECTOR_SIZE;
(gdb) n
581 if (lseek (fd, offset, SEEK_SET) != offset)
(gdb) p offset
$4 = 4619790794267537920
which is surprising.... as the actual product ought to be 753906176
Now, take a look at this:
(gdb) whatis offset
type = off_t
(gdb) p sizeof(offset)
$5 = 8
(gdb) p sizeof(off_t)
$6 = 8
This got me by surprise, so I checked with a small C program and got
an off_t with size 4.
#include <stdio.h>
#define SECTOR_SIZE 0x200
int main()
{
int sector = 1472473;
off_t offset = (off_t)sector * (off_t)SECTOR_SIZE;
printf("%ld",offset);
return 0;
}
So I am a little confused here. Note that the seek actually succeeds
but the write call fails returning -1 and errno is 9 (BAD File Number)
Note that the argument to the open call was:
(gdb) p dev
$10 =
"/dev/address@hidden@address@hidden@address@hidden&@address@hidden&@address@hidden&@"
Note that the file is opened O_RDONLY and we are trying to write! So
this code has me completely confused.
Would you be so kind as to tell me what is going on exactly ?
BTW, I am using gdb 5.0, grub (0.5.96) was compiled by me, from source
as follow:
./configure --prefix=/usr
make;make install
gcc is the following version:
thorgal:/usr/local/src/grub-0.5.96/grub# gcc -v
Reading specs from /usr/lib/gcc-lib/i386-linux/2.95.2/specs
gcc version 2.95.2 20000220 (Debian GNU/Linux)
from standard stable potato 2.2r3 distrib.
hde1,hde2 are both mounted when trying to execute the grub
commands. Here is the output of fdisk on /dev/hde (that may help)
Command (m for help): p
Disk /dev/hde: 16 heads, 63 sectors, 39870 cylinders
Units = cylinders of 1008 * 512 bytes
Device Boot Start End Blocks Id System
/dev/hde1 * 1 10159 5120104+ 83 Linux
/dev/hde2 10160 39870 14974344 83 Linux
Same thing for hda
Command (m for help): p
Disk /dev/hda: 128 heads, 63 sectors, 935 cylinders
Units = cylinders of 8064 * 512 bytes
Device Boot Start End Blocks Id System
/dev/hda1 * 1 935 3769888+ b Win95 FAT32
My kernel is 2.4.4:
thorgal:/usr/local/src/grub-0.5.96/grub# uname -a
Linux thorgal 2.4.4 #19 Tue Jul 3 18:45:47 EDT 2001 i686 unknown
And the following modules are loaded:
thorgal:/usr/local/src/grub-0.5.96/grub# lsmod
Module Size Used by
ide-floppy 9584 0 (autoclean)
ipt_state 960 3 (autoclean)
ipt_limit 1200 29 (autoclean)
iptable_filter 2080 0 (autoclean) (unused)
iptable_mangle 2048 0 (unused)
ipt_LOG 3472 1
ipt_MIRROR 1312 0 (unused)
ipt_MASQUERADE 1488 1
ipt_TOS 1248 0 (unused)
ipt_REDIRECT 1088 0 (unused)
iptable_nat 15184 0 [ipt_MASQUERADE ipt_REDIRECT]
ipt_REJECT 3328 0 (unused)
ip_tables 10432 13 [ipt_state ipt_limit iptable_filter
iptable_mangle ipt_LOG ipt_MIRROR ipt_MASQUERADE ipt_TOS ipt_REDIRECT
iptable_nat ipt_REJECT]
ip_conntrack 14240 2 [ipt_state ipt_MASQUERADE ipt_REDIRECT
iptable_nat]
via686a 8160 0 (unused)
eeprom 3216 0 (unused)
adm1021 5600 0 (unused)
sensors 6144 0 [via686a eeprom adm1021]
i2c-isa 1200 0 (unused)
i2c-viapro 3936 0 (unused)
i2c-core 13072 0 [via686a eeprom adm1021 sensors i2c-isa
i2c-viapro]
rtc 5376 0 (autoclean)
nls_iso8859-1 2864 0 (unused)
nls_cp437 4384 0 (unused)
vfat 9104 0 (unused)
fat 31488 0 [vfat]
I would appreciate any form of feedback.
Thanks a lot,
--
Laurent
- Strange behavior with grub (and disk failure),
Laurent Michel <=