[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4A1B164B.1010108@gmail.com>
Date: Tue, 26 May 2009 00:06:03 +0200
From: Niel Lambrechts <niel.lambrechts@...il.com>
To: Alan Cox <alan@...rguk.ukuu.org.uk>
CC: Tejun Heo <tj@...nel.org>,
"linux.kernel" <linux-kernel@...r.kernel.org>
Subject: Re: 2.6.29 regression: ATA bus errors on resume
On 05/25/2009 10:15 AM, Alan Cox wrote:
>> something to the cdrom which is attached to ata2. Something very
>> fishy is going on there. Sounds like an electric or some sort of
>> interference problem but I'm not sure. :-(
>>
> I'm just wondering what happens on a resume if a phy change event is seen
> by the phy before or during the time registers (such as the port mappings
> on an ICH) are being restored (or not restored even). That seems to be one
> way you could get an event on the wrong port
I've tested all of the kernels I have again since 2.6.29.4 also came out
just recently. I did a hibernate/resume for each in the console, then
repeated the same in X, then continued to the next kernel.
The 2.6.29.4 log is much larger, since some other badness happened there
- there is a large kernel trace in there as my first X hibernation
attempt failed and came back to X after a few seconds. The system seemed
functional, it did not keep generating kernel messages - when I then
retried a hibernate it worked, along with the resume. Another unrelated
bug perhaps?
As for "hard resetting link" messages, they seemed to always happen
under X the times I tried it.
Kernel EXT4-errors? Console:ata1 reset?
Console:ata2-reset? X:ata1 reset? X:ata2 reset?
2.6.28.10 No no yes
yes no
2.6.29.4* No no no
no no
2.6.29.4** No - -
yes no
2.6.30-rc6 Yes - -
yes no
2.6.30-rc6 No no no
yes no
* Xorg hibernation attempt failed.
* Xorg Second hibernation attempt (no extra reboot)
I also did a side by side comparison of the messages I have for
2.6.30-rc6, the one with EXT4 errors I reported on yesterday, and
another one that worked just fine tonight. I stripped all time-stamps
and some pulseaudio messages from the bad one and attached them here,
and also saved the full messages for each kernel to
http://bugzilla.kernel.org/show_bug.cgi?id=13017 .
Since analysing the code-path is still a bit beyond me, I'll leave you
with a little summary of the differences I notice.
A = 2.6.30-rc6 (EXT4 clean)
B = 2.6.30-rc6 (EXT4 errors triggered)
# B first does an ata2 ACPI cmd, A starts with ata1. Only a slight
sequence difference, output is the same.
B: linux-7vph kernel: ata2.00: ACPI cmd e3/00:1f:00:00:00:a0 succeeded
# The main difference appears:
A:linux-7vph kernel: Restarting tasks ... done.
B:linux-7vph kernel: Restarting tasks ... <3>ata1.00: exception Emask
0x10 SAct 0x1f SErr 0x50000 action 0xe frozen
# A first shows "done", then only followed by a frozen message, but with
a different SAct value:
A:linux-7vph kernel: ata1.00: exception Emask 0x10 SAct 0x1ff SErr
0x50000 action 0xe frozen
# Both then have:
linux-7vph kernel: ata1.00: irq_stat 0x00400008, PHY RDY changed
# from there, A seems to have a little extra sequence:
linux-7vph kernel: ata1.00: cmd 60/08:00:ef:fc:48/00:00:0b:00:00/40 tag
0 ncq 4096 in
linux-7vph kernel: res 50/00:38:97:7f:e6/00:00:0b:00:00/40
Emask 0x10 (ATA bus error)
linux-7vph kernel: ata1.00: status: { DRDY }
linux-7vph kernel: ata1.00: cmd 60/08:08:97:d5:69/00:00:0e:00:00/40 tag
1 ncq 4096 in
linux-7vph kernel: res 50/00:38:97:7f:e6/00:00:0b:00:00/40
Emask 0x10 (ATA bus error)
linux-7vph kernel: ata1.00: status: { DRDY }
linux-7vph kernel: ata1.00: cmd 60/30:10:8f:5f:56/00:00:0f:00:00/40 tag
2 ncq 24576 in
linux-7vph kernel: res 50/00:38:97:7f:e6/00:00:0b:00:00/40
Emask 0x10 (ATA bus error)
linux-7vph kernel: ata1.00: status: { DRDY }
linux-7vph kernel: ata1.00: cmd 60/10:18:c7:5f:56/00:00:0f:00:00/40 tag
3 ncq 8192 in
linux-7vph kernel: res 50/00:38:97:7f:e6/00:00:0b:00:00/40
Emask 0x10 (ATA bus error)
linux-7vph kernel: ata1.00: status: { DRDY }
linux-7vph kernel: ata1.00: cmd 60/10:20:ff:5f:56/00:00:0f:00:00/40 tag
4 ncq 8192 in
linux-7vph kernel: res 50/00:38:97:7f:e6/00:00:0b:00:00/40
Emask 0x10 (ATA bus error)
# B only then shows:
B:linux-7vph kernel: done.
# soon after, B spirals into errors:
linux-7vph kernel: XXX scsi_eh_flush_done_q: online=1(2) noretry=2
retries=0 allowed=5
linux-7vph kernel: scsi_eh_0: flush finish cmd: f6838ec0
linux-7vph kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
linux-7vph kernel: sd 0:0:0:0: [sda] Sense Key : Aborted Command
[current] [descriptor]
Hope this helps.
Niel
View attachment "fmt.messages.2.6.30-rc6-pae.ext4-errors.txt" of type "text/plain" (15439 bytes)
View attachment "fmt.messages.2.6.30-rc6-pae.txt" of type "text/plain" (27602 bytes)
Powered by blists - more mailing lists