linux-kernel - Re: 2.6.29 regression: ATA bus errors on resume

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4A4396E9.1030509@gmail.com>
Date:	Thu, 25 Jun 2009 17:25:29 +0200
From:	Niel Lambrechts <niel.lambrechts@...il.com>
To:	Tejun Heo <htejun@...il.com>
CC:	Alan Cox <alan@...rguk.ukuu.org.uk>,
	"linux.kernel" <linux-kernel@...r.kernel.org>,
	Theodore Tso <tytso@....edu>
Subject: Re: 2.6.29 regression: ATA bus errors on resume

On 06/25/2009 02:57 PM, Tejun Heo wrote:
> Sorry about the long delay.
>
> The result is perfectly good and yeah dump_stack() on the issue path
> would help but the problem is that block IO requests are processed
> asynchronously so by the time we find out which request fail, the
> requester stack is long gone.  We can either record the stack trace
> with each request or trace it back one step at a time by chasing down
> the completion callbacks.  The first requires more coding, so... :-)
>
> Looks like the request gotta be coming from __breadahead().  The only
> place this is used in ext4 is in __ext4_get_inode_loc().  Ah.. it also
> contains the matching error message.  I still don't see how the READA
> buffer reads can affect the synchronous path.  They're doing proper
> exclusion via buffer lock.  Maybe they're getting merged?  Yeap, looks
> like block code is merging READAs and regular READs.
>
> Can you please try the attached patch and reproduce the problem and
> report the kernel log?  Hopefully, this will be the last debug run.
>    

Hi Tejun,

I've recently switched my root partition from OpenSUSE 11.1 to Fedora 11 
and since then I've not again seen the issue. I'm still using vanilla 
2.6.30 generated with the same .config and EXT4 as before, so I have no 
idea why I cannot reproduce the issue. I still use hibernate + sleep 
frequently, and I just checked - I have 5 days uptime with a mount count 
of 20 and the file-system is still clean.

The one big difference is that my original partition was a EXT2 -> EXT3 
-> EXT4 upgrade job over a long period of time, and some of the EXT4 
parameters now used by Fedora 11 on the reformatted root partition are 
different from what I had then. Here is a summary of the differences in 
case it matters at all:

Current settings:
Default mount options:    user_xattr acl
Inodes per group:         8192
Inode blocks per group:   512
Flex block group size:    16
Required extra isize:     28
Desired extra isize:      28
Default directory hash:   half_md4

Previous settings:
Default mount options:    (none)
Inodes per group:         8176
Inode blocks per group:   511
Default directory hash:   tea

If I do notice any such errors again I'll apply the debug patch and let 
you know, but it does seem as if the upgrade made this issue disappear...

Regards,
Niel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/