grub-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

HP root-caues analysis for GRUB "Red screen of death" on DL120/DL360 G7


From: Iain Barker
Subject: HP root-caues analysis for GRUB "Red screen of death" on DL120/DL360 G7 servers
Date: Thu, 8 Dec 2011 19:39:56 +0000

I am posting the following information with permission from HP support, in the 
hope that it may be useful for future GRUB developer reference.

Summary:
When using GRUB to chain-load from one device to another device (e.g. USB to 
HDD), the HP BIOS used in DL120/DL360 and other G7 servers reports "Illegal 
Opcode" and a red crashdump screen.  This failure did not occur on previous 
generation (G6) servers of the same models, which used AMI/Phoenix BIOS.

References:
Acme Packet opened HP support case 4635415916 for additional clarification in 
reference to the public HP customer advisory number c02695572

http://bizsupport1.austin.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c02695572&lang=en&cc=us&taskId=101&prodSeriesId=4091408&prodTypeId=15351

Root cause analysis:
HP level3 engineering identified the root cause as follows:

_start_quoted_text_

HP Level-3 engineering have found that the HP BIOS on the DL120 G7 is not 
causing the red screen.   GRUB loads its own INT13 handler in the interrupt 
vector table, so it will now intercept all int13 calls.  Some time after it 
does that, GRUB does some type of memory copy operation which overrides the 
data at the address where Grub stores the INT13 handler code.   As a result, on 
the next Int13 call in grub, the interrupt handler is no longer there so the 
processor just starts to execute whatever data overwrote where the int13 
handler code was.  
 
Here is how the red screen happens: When the processor executes an illegal 
instruction (like when it tries to execute whatever is in the overwritten int13 
handler), the processor causes and interrupt which the BIOS then handles by 
printing the red screen with the register dump and the message.  So our BIOS 
just prints out the red screen, but the cause of the red screen is Grub.  
 
The specific scenario which leads to this is identified as follows:

1) Grub installs its own INT13 handler
2) Near the end of the chain loading process, Grub loads an image of the Linux 
kernel into memory which wipes out their Int13 handler.
3) Right before grub transfers control to the kernel to boot, grub makes a call 
to a function to turn off the floppy drive.
4) The call to the floppy code then makes an Int13 call to the handler which 
has been overwritten by the kernel and thereby results in the red screen.
 
The problem seems to be that Grub made assumptions about the memory layout in 
our system which is not accurate.  HP systems that use HP developed BIOSes 
instead of outsourced (AMI) BIOSes use more of a memory area called EBDA than a 
typical system does.   As a result, Grub assumes there's memory that it could 
safely use instead of properly calculating an area of safe memory to use. 
That's probably why Grub worked  on the other systems and fails on G7.  

_end quoted text_

Regards,
Iain Barker - Platform Engineering, Acme Packet.
address@hidden




reply via email to

[Prev in Thread] Current Thread [Next in Thread]