linux-kernel - Re: 2.6.29 regression: ATA bus errors on resume

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4A1B8193.1010703@gmail.com>
Date:	Tue, 26 May 2009 07:43:47 +0200
From:	Niel Lambrechts <niel.lambrechts@...il.com>
To:	Tejun Heo <tj@...nel.org>
CC:	Alan Cox <alan@...rguk.ukuu.org.uk>,
	"linux.kernel" <linux-kernel@...r.kernel.org>,
	Theodore Tso <tytso@....edu>
Subject: Re: 2.6.29 regression: ATA bus errors on resume

On 05/26/2009 06:58 AM, Tejun Heo wrote:
> Hello, Niel.
>
> Niel Lambrechts wrote:
>    
>> I've tested all of the kernels I have again since 2.6.29.4 also came out
>> just recently. I did a hibernate/resume for each in the console, then
>> repeated the same in X, then continued to the next kernel.
>>
>> The 2.6.29.4 log is much larger, since some other badness happened there
>> - there is a large kernel trace in there as my first X hibernation
>> attempt failed and came back to X after a few seconds. The system seemed
>> functional, it did not keep generating kernel messages - when I then
>> retried a hibernate it worked, along with the resume. Another unrelated
>> bug perhaps?
>>
>> As for "hard resetting link" messages, they seemed to always happen
>> under X the times I tried it.
>>
>> Kernel       EXT4-errors?    Console:ata1 reset?   Console:ata2-reset?    X:ata1 reset?    X:ata2 reset?
>> 2.6.28.10    No              no                     yes                   yes              no
>> 2.6.29.4*    No              no                     no                    no               no
>> 2.6.29.4**   No              -                      -                     yes              no
>> 2.6.30-rc6   Yes             -                      -                     yes              no
>> 2.6.30-rc6   No              no                     no                    yes              no
>>
>> * Xorg hibernation attempt failed.
>> * Xorg Second hibernation attempt (no extra reboot)
>>
>> I also did a side by side comparison of the messages I have for
>> 2.6.30-rc6, the one with EXT4 errors I reported on yesterday, and
>> another one that worked just fine tonight. I stripped all time-stamps
>> and some pulseaudio messages from the bad one and attached them here,
>> and also saved the full messages for each kernel to
>> http://bugzilla.kernel.org/show_bug.cgi?id=13017 .
>>
>> Since analysing the code-path is still a bit beyond me, I'll leave you
>> with a little summary of the differences I notice.
>>
>> A = 2.6.30-rc6 (EXT4 clean)
>> B = 2.6.30-rc6 (EXT4 errors triggered)
>>      
> Duplicate PHY events are likely to be dependent on timing and
> non-deterministic.  The ext4 corrupting or not depends on whether a
> request with failfast set was in-flight at the time of the second PHY
> event, which again is dependent on timing.  At any rate, this looks
> like a problem of ext4 (or something between ext4 and the driver).  It
> either shouldn't issue failfast command or should take appropriate
> recovery action if it does.  It would be really nice if you can give a
> shot at ext3.

Urgh. My root file-system is mounted with extents on, I would have to 
re-install entirely.

I'm wondering why no one else is complaining, or whether the problem is 
limited to ICH9M/M-E controllers with EXT4 or a certain type of 
hard-drive. The laptop is a Lenovo W500 (fairly similar to T500), so 
maybe not a lot of people with this type of controller is using EXT4 yet.

Anyhow, I think Theodore may have ruled this out as a EXT4 problem 
already (I first copied him) so I'm not sure what to do now, it will 
take some strong will (and even more time) for me to re-install EXT3. I 
just shouldn't have to, dammit. :-p

Regards,
Niel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/