lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4A1B164B.1010108@gmail.com>
Date:	Tue, 26 May 2009 00:06:03 +0200
From:	Niel Lambrechts <niel.lambrechts@...il.com>
To:	Alan Cox <alan@...rguk.ukuu.org.uk>
CC:	Tejun Heo <tj@...nel.org>,
	"linux.kernel" <linux-kernel@...r.kernel.org>
Subject: Re: 2.6.29 regression: ATA bus errors on resume

On 05/25/2009 10:15 AM, Alan Cox wrote:
>> something to the cdrom which is attached to ata2.  Something very
>> fishy is going on there.  Sounds like an electric or some sort of
>> interference problem but I'm not sure.  :-(
>>      
> I'm just wondering what happens on a resume if a phy change event is seen
> by the phy before or during the time registers (such as the port mappings
> on an ICH) are being restored (or not restored even). That seems to be one
> way you could get an event on the wrong port

I've tested all of the kernels I have again since 2.6.29.4 also came out 
just recently. I did a hibernate/resume for each in the console, then 
repeated the same in X, then continued to the next kernel.

The 2.6.29.4 log is much larger, since some other badness happened there 
- there is a large kernel trace in there as my first X hibernation 
attempt failed and came back to X after a few seconds. The system seemed 
functional, it did not keep generating kernel messages - when I then 
retried a hibernate it worked, along with the resume. Another unrelated 
bug perhaps?

As for "hard resetting link" messages, they seemed to always happen 
under X the times I tried it.

Kernel       EXT4-errors?    Console:ata1 reset?    
Console:ata2-reset?    X:ata1 reset?    X:ata2 reset?
2.6.28.10    No              no                     yes                
     yes              no
2.6.29.4*    No              no                     no                
      no               no
2.6.29.4**   No              -                      -                   
    yes              no
2.6.30-rc6   Yes             -                      -                   
    yes              no
2.6.30-rc6   No              no                     no                   
   yes              no

* Xorg hibernation attempt failed.
* Xorg Second hibernation attempt (no extra reboot)

I also did a side by side comparison of the messages I have for 
2.6.30-rc6, the one with EXT4 errors I reported on yesterday, and 
another one that worked just fine tonight. I stripped all time-stamps 
and some pulseaudio messages from the bad one and attached them here, 
and also saved the full messages for each kernel to 
http://bugzilla.kernel.org/show_bug.cgi?id=13017 .

Since analysing the code-path is still a bit beyond me, I'll leave you 
with a little summary of the differences I notice.

A = 2.6.30-rc6 (EXT4 clean)
B = 2.6.30-rc6 (EXT4 errors triggered)

# B first does an ata2 ACPI cmd, A starts with ata1. Only a slight 
sequence difference, output is the same.
B: linux-7vph kernel: ata2.00: ACPI cmd e3/00:1f:00:00:00:a0 succeeded

# The main difference appears:
A:linux-7vph kernel: Restarting tasks ... done.
B:linux-7vph kernel: Restarting tasks ... <3>ata1.00: exception Emask 
0x10 SAct 0x1f SErr 0x50000 action 0xe frozen

# A first shows "done", then only followed by a frozen message, but with 
a different SAct value:
A:linux-7vph kernel: ata1.00: exception Emask 0x10 SAct 0x1ff SErr 
0x50000 action 0xe frozen

# Both then have:
linux-7vph kernel: ata1.00: irq_stat 0x00400008, PHY RDY changed

# from there, A seems to have a little extra sequence:
linux-7vph kernel: ata1.00: cmd 60/08:00:ef:fc:48/00:00:0b:00:00/40 tag 
0 ncq 4096 in
linux-7vph kernel:          res 50/00:38:97:7f:e6/00:00:0b:00:00/40 
Emask 0x10 (ATA bus error)
linux-7vph kernel: ata1.00: status: { DRDY }
linux-7vph kernel: ata1.00: cmd 60/08:08:97:d5:69/00:00:0e:00:00/40 tag 
1 ncq 4096 in
linux-7vph kernel:          res 50/00:38:97:7f:e6/00:00:0b:00:00/40 
Emask 0x10 (ATA bus error)
linux-7vph kernel: ata1.00: status: { DRDY }
linux-7vph kernel: ata1.00: cmd 60/30:10:8f:5f:56/00:00:0f:00:00/40 tag 
2 ncq 24576 in
linux-7vph kernel:          res 50/00:38:97:7f:e6/00:00:0b:00:00/40 
Emask 0x10 (ATA bus error)
linux-7vph kernel: ata1.00: status: { DRDY }
linux-7vph kernel: ata1.00: cmd 60/10:18:c7:5f:56/00:00:0f:00:00/40 tag 
3 ncq 8192 in
linux-7vph kernel:          res 50/00:38:97:7f:e6/00:00:0b:00:00/40 
Emask 0x10 (ATA bus error)
linux-7vph kernel: ata1.00: status: { DRDY }
linux-7vph kernel: ata1.00: cmd 60/10:20:ff:5f:56/00:00:0f:00:00/40 tag 
4 ncq 8192 in
linux-7vph kernel:          res 50/00:38:97:7f:e6/00:00:0b:00:00/40 
Emask 0x10 (ATA bus error)

# B only then shows:
B:linux-7vph kernel: done.

# soon after, B spirals into errors:
linux-7vph kernel: XXX scsi_eh_flush_done_q: online=1(2) noretry=2 
retries=0 allowed=5
linux-7vph kernel: scsi_eh_0: flush finish cmd: f6838ec0
linux-7vph kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_OK 
driverbyte=DRIVER_SENSE
linux-7vph kernel: sd 0:0:0:0: [sda] Sense Key : Aborted Command 
[current] [descriptor]

Hope this helps.

Niel





View attachment "fmt.messages.2.6.30-rc6-pae.ext4-errors.txt" of type "text/plain" (15439 bytes)

View attachment "fmt.messages.2.6.30-rc6-pae.txt" of type "text/plain" (27602 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ