lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Tue, 26 May 2009 07:43:47 +0200 From: Niel Lambrechts <niel.lambrechts@...il.com> To: Tejun Heo <tj@...nel.org> CC: Alan Cox <alan@...rguk.ukuu.org.uk>, "linux.kernel" <linux-kernel@...r.kernel.org>, Theodore Tso <tytso@....edu> Subject: Re: 2.6.29 regression: ATA bus errors on resume On 05/26/2009 06:58 AM, Tejun Heo wrote: > Hello, Niel. > > Niel Lambrechts wrote: > >> I've tested all of the kernels I have again since 2.6.29.4 also came out >> just recently. I did a hibernate/resume for each in the console, then >> repeated the same in X, then continued to the next kernel. >> >> The 2.6.29.4 log is much larger, since some other badness happened there >> - there is a large kernel trace in there as my first X hibernation >> attempt failed and came back to X after a few seconds. The system seemed >> functional, it did not keep generating kernel messages - when I then >> retried a hibernate it worked, along with the resume. Another unrelated >> bug perhaps? >> >> As for "hard resetting link" messages, they seemed to always happen >> under X the times I tried it. >> >> Kernel EXT4-errors? Console:ata1 reset? Console:ata2-reset? X:ata1 reset? X:ata2 reset? >> 2.6.28.10 No no yes yes no >> 2.6.29.4* No no no no no >> 2.6.29.4** No - - yes no >> 2.6.30-rc6 Yes - - yes no >> 2.6.30-rc6 No no no yes no >> >> * Xorg hibernation attempt failed. >> * Xorg Second hibernation attempt (no extra reboot) >> >> I also did a side by side comparison of the messages I have for >> 2.6.30-rc6, the one with EXT4 errors I reported on yesterday, and >> another one that worked just fine tonight. I stripped all time-stamps >> and some pulseaudio messages from the bad one and attached them here, >> and also saved the full messages for each kernel to >> http://bugzilla.kernel.org/show_bug.cgi?id=13017 . >> >> Since analysing the code-path is still a bit beyond me, I'll leave you >> with a little summary of the differences I notice. >> >> A = 2.6.30-rc6 (EXT4 clean) >> B = 2.6.30-rc6 (EXT4 errors triggered) >> > Duplicate PHY events are likely to be dependent on timing and > non-deterministic. The ext4 corrupting or not depends on whether a > request with failfast set was in-flight at the time of the second PHY > event, which again is dependent on timing. At any rate, this looks > like a problem of ext4 (or something between ext4 and the driver). It > either shouldn't issue failfast command or should take appropriate > recovery action if it does. It would be really nice if you can give a > shot at ext3. Urgh. My root file-system is mounted with extents on, I would have to re-install entirely. I'm wondering why no one else is complaining, or whether the problem is limited to ICH9M/M-E controllers with EXT4 or a certain type of hard-drive. The laptop is a Lenovo W500 (fairly similar to T500), so maybe not a lot of people with this type of controller is using EXT4 yet. Anyhow, I think Theodore may have ruled this out as a EXT4 problem already (I first copied him) so I'm not sure what to do now, it will take some strong will (and even more time) for me to re-install EXT3. I just shouldn't have to, dammit. :-p Regards, Niel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists