|
From: | Michel Hermier |
Subject: | Re: [PATCH] Better ahci error handling |
Date: | Tue, 2 Jan 2018 11:22:48 +0100 |
Hi,
on a Fujitsu E744 laptop we have a problem that sometimes there is a
very long delay (up to several minutes) when booting from hard disk.
It seems accessing the DVD drive (which has no disk inserted)
sometimes fails with some errors that are not correctly handled by
grub, which leads to each access being stalled until the 20s timeout
triggers. This seems to happen when grub is trying to read
filesystem/partition data.
Example debug output:
disk/ahci.c:1020: AHCI command scheduled
disk/ahci.c:1022: AHCI tfd = 50
disk/ahci.c:1024: AHCI inten = 7dc000ff
disk/ahci.c:1026: AHCI intstatus = 3
disk/ahci.c:1031: AHCI inten = 7dc000ff
disk/ahci.c:1033: AHCI tfd = 50
disk/ahci.c:1036: AHCI sig = eb140101
disk/ahci.c:1038: AHCI tfd = 2051
disk/ahci.c:1049: AHCI status <1 1 40000001 2051>
disk/ahci.c:926: AHCI tfd = 2051
disk/ahci.c:932: AHCI tfd = 2051
disk/ahci.c:936: AHCI tfd = 2051
disk/ahci.c:942: grub_ahci_read (size=0, cmdsize = 0)
disk/ahci.c:957: AHCI tfd = 2051, CL=0xda455000
disk/ahci.c:968: AHCI tfd = 2051
disk/ahci.c:980: AHCI tfd = 2051
disk/ahci.c:987: AHCI tfd = 2051
disk/ahci.c:998: cfis: 27 80 08 00 00 00 00 00
disk/ahci.c:1003: cfis: 00 00 00 00 00 00 00 00
disk/ahci.c:1015: PRDT = da453000, 0, ffffffff (128)
The problem is that the command_issue bit that is checked in the loop
is only reset if the "HBA receives a FIS which clears the BSY, DRQ, and
ERR bits for the command", but the ERR bit is never cleared.
The patch below seems to fix the issue by checking the error bits in
the interrupt status register. According to the AHCI 1.2 spec,
"Interrupt sources that are disabled (‘0’) are still reflected in the
status registers.", so this should work even though grub uses polling.
The relevant bit in our case is the Task File Error Status (TFES),
which is equivalent to the ERR bit 0 in tfd. But the patch below also
checks the other error bits except for the "Interface non-fatal error
status" bit.
Cheers,
Stefan
diff --git a/grub-core/disk/ahci.c b/grub-core/disk/ahci.c
--- a/grub-core/disk/ahci.c
+++ b/grub-core/disk/ahci.c
@@ -82,6 +82,19 @@ enum grub_ahci_hba_port_command
GRUB_AHCI_HBA_PORT_CMD_FR = 0x4000,
};
+enum grub_ahci_hba_port_int_status
+ {
+ GRUB_AHCI_HBA_PORT_IS_IFS = (1UL << 27),
+ GRUB_AHCI_HBA_PORT_IS_HBDS = (1UL << 28),
+ GRUB_AHCI_HBA_PORT_IS_HBFS = (1UL << 29),
+ GRUB_AHCI_HBA_PORT_IS_TFES = (1UL << 30),
+ };
+#define IS_FATAL_ERROR (\
+ GRUB_AHCI_HBA_PORT_IS_IFS| \
+ GRUB_AHCI_HBA_PORT_IS_HBDS|\
+ GRUB_AHCI_HBA_PORT_IS_HBFS|\
+ GRUB_AHCI_HBA_PORT_IS_TFES)
+
struct grub_ahci_hba
{
grub_uint32_t cap;
@@ -1026,7 +1039,8 @@ grub_ahci_readwrite_real (struct grub_ahci_device *dev,
endtime = grub_get_time_ms () + (spinup ? 20000 : 20000);
while ((dev->hba->ports[dev->port].command_issue & 1))
- if (grub_get_time_ms () > endtime)
+ if (grub_get_time_ms () > endtime ||
+ (dev->hba->ports[dev->port].intstatus & IS_FATAL_ERROR))
{
grub_dprintf ("ahci", "AHCI status <%x %x %x %x>\n",
dev->hba->ports[dev->port].command_issue,
@@ -1034,7 +1048,10 @@ grub_ahci_readwrite_real (struct grub_ahci_device *dev,
dev->hba->ports[dev->port].intstatus,
dev->hba->ports[dev->port].task_file_data);
dev->hba->ports[dev->port].command_issue = 0;
- err = grub_error (GRUB_ERR_IO, "AHCI transfer timed out");
+ if (dev->hba->ports[dev->port].intstatus & IS_FATAL_ERROR)
+ err = grub_error (GRUB_ERR_IO, "AHCI transfer error");
+ else
+ err = grub_error (GRUB_ERR_IO, "AHCI transfer timed out");
if (!reset)
grub_ahci_reset_port (dev, 1);
break;
_______________________________________________
Grub-devel mailing list
address@hidden
https://lists.gnu.org/mailman/listinfo/grub-devel
[Prev in Thread] | Current Thread | [Next in Thread] |