bug-hurd
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fwd: Hurd shutdown problems


From: Brent W. Baccala
Subject: Re: Fwd: Hurd shutdown problems
Date: Tue, 9 Aug 2016 15:37:00 -1000

On Mon, Aug 8, 2016 at 9:32 PM, Justus Winter <justus@gnupg.org> wrote:
Hello,

"Brent W. Baccala" <cosine@freesoft.org> writes:

> I don't have to swapoff to have "symptoms".  The kernel debugger normally
> shows symbolic names, i.e:
>
> Stopped  at  machine_idle+0xe:   leave
> machine_idle(0,81a2c630,3806f64,0,9b448b38)+0xe
> idle_thread_continue(9fcbdde0,81028b50,9c0c7fe4,0,9c3d5548)+0x2a
>
> Once I've got enough swap in use, though, it stops doing this.  Now I see:
>
> Stopped       at  0x810000be: leave
> 0x810000be(0,0,9fcc5990,0,9fb90b30)
> 0x810293fa(9fcbdde0,81028b50,99526fe4,0,9c3d5548)

Uh :( that is not good.  That sounds like a swap-related corruption in
the kernel.

> When I see a kernel page fault, it's always in strcmp()

strcmp is used in the elf symbol lookup code, so that might explain the
fault.


GDB on the kernel shows a seemingly corrupted ELF symbol table when elf_db_search_symbol() is called.

Here's what the symbol table looks like when the system boots:

(gdb) print self->start
$3 = (Elf32_Sym *) 0x804fb5ec
(gdb) print self->start[0]
$4 = {st_name = 0, st_value = 0, st_size = 0, st_info = 0 '\000', st_other = 0 '\000', st_shndx = 0}
(gdb) print self->start[1]
$5 = {st_name = 0, st_value = 2164260864, st_size = 0, st_info = 3 '\003', st_other = 0 '\000', st_shndx = 1}
(gdb) print self->start[2]
$6 = {st_name = 0, st_value = 2165125376, st_size = 0, st_info = 3 '\003', st_other = 0 '\000', st_shndx = 2}
(gdb) print self->start[3]
$7 = {st_name = 0, st_value = 2165262992, st_size = 0, st_info = 3 '\003', st_other = 0 '\000', st_shndx = 3}
(gdb) print self->start[4]
$8 = {st_name = 0, st_value = 2165395456, st_size = 0, st_info = 3 '\003', st_other = 0 '\000', st_shndx = 4}
(gdb) print self->start[5]
$9 = {st_name = 0, st_value = 2165452800, st_size = 0, st_info = 3 '\003', st_other = 0 '\000', st_shndx = 5}
(gdb) print self->start[6]
$10 = {st_name = 0, st_value = 0, st_size = 0, st_info = 3 '\003', st_other = 0 '\000', st_shndx = 6}

 After I run a certain compile (just make, g++, ld), here's what it looks like:

(gdb) print self->start
$15 = (Elf32_Sym *) 0x804fb5ec
(gdb) print self->start[0]
$16 = {st_name = 22, st_value = 0, st_size = 0, st_info = 13 '\r', st_other = 26 '\032', st_shndx = 0}
(gdb) print self->start[1]
$17 = {st_name = 0, st_value = 562210328, st_size = 562101944, st_info = 0 '\000', st_other = 0 '\000', st_shndx = 0}
(gdb) print self->start[2]
$18 = {st_name = 0, st_value = 0, st_size = 0, st_info = 0 '\000', st_other = 0 '\000', st_shndx = 0}
(gdb) print self->start[3]
$19 = {st_name = 0, st_value = 0, st_size = 0, st_info = 3 '\003', st_other = 0 '\000', st_shndx = 0}
(gdb) print self->start[4]
$20 = {st_name = 23, st_value = 0, st_size = 0, st_info = 13 '\r', st_other = 26 '\032', st_shndx = 0}
(gdb) print self->start[5]
$22 = {st_name = 0, st_value = 562210352, st_size = 562210400, st_info = 0 '\000', st_other = 0 '\000', st_shndx = 0}
(gdb) print self->start[6]
$23 = {st_name = 0, st_value = 0, st_size = 0, st_info = 0 '\000', st_other = 0 '\000', st_shndx = 0}
(gdb) print self->start[7]
$24 = {st_name = 0, st_value = 0, st_size = 0, st_info = 3 '\003', st_other = 0 '\000', st_shndx = 0}


Both GDB traces are with the kernel halted near the beginning of elf_db_search_symbol(), called from the kernel debugger:

(gdb) where
#0  elf_db_search_symbol (stab=0x81127b00 <db_symtabs>, off=2164261054, strategy=2, diffp=0x81124ea0 <int_stack+3744>)
    at ../ddb/db_elf.c:159
#1  0x810132e7 in db_search_in_task_symbol (val=2164261054, strategy=2, offp=0x81124f10 <int_stack+3856>, task=0x0)
    at ../ddb/db_sym.c:354
#2  0x8101342a in db_search_task_symbol (val=2164261054, strategy=2, offp=0x81124f10 <int_stack+3856>, task=0x0)
    at ../ddb/db_sym.c:315
#3  0x810135dd in db_task_printsym (off=2164261054, strategy=2, task=0x0) at ../ddb/db_sym.c:458
#4  0x8100f377 in db_print_loc_and_inst (loc=2164261054, task=0x0) at ../ddb/db_examine.c:328
#5  0x8104fe9d in db_task_trap (type=-1, code=0, user_space=0) at ../ddb/db_trap.c:92
#6  0x81045d61 in kdb_kentry (int_regs=0x81124fe8 <int_stack+4072>) at ../i386/i386/db_interface.c:392
#7  0x810082ac in kdb_from_iret () at ../i386/i386/locore.S:864
#8  0x942dff6c in ?? ()
#9  0x81146610 in default_pset ()
#10 0x00000000 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Any chance the symbol table could have been swapped out?  Any idea how to debug it?

> I'm just learning Hurd.  Any ideas?

Keep at it, the Hurd is an interesting system to learn from.  But you
might want to start with a simpler problem.


I wouldn't mind a simpler problem, but I want to get my system cleanly booting and shutting down!

I hate this kind of "recursion", but hopefully the result will be a better system.

    agape
    brent
 

reply via email to

[Prev in Thread] Current Thread [Next in Thread]