lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 25 Jun 2009 17:25:29 +0200
From:	Niel Lambrechts <niel.lambrechts@...il.com>
To:	Tejun Heo <htejun@...il.com>
CC:	Alan Cox <alan@...rguk.ukuu.org.uk>,
	"linux.kernel" <linux-kernel@...r.kernel.org>,
	Theodore Tso <tytso@....edu>
Subject: Re: 2.6.29 regression: ATA bus errors on resume

On 06/25/2009 02:57 PM, Tejun Heo wrote:
> Sorry about the long delay.
>
> The result is perfectly good and yeah dump_stack() on the issue path
> would help but the problem is that block IO requests are processed
> asynchronously so by the time we find out which request fail, the
> requester stack is long gone.  We can either record the stack trace
> with each request or trace it back one step at a time by chasing down
> the completion callbacks.  The first requires more coding, so... :-)
>
> Looks like the request gotta be coming from __breadahead().  The only
> place this is used in ext4 is in __ext4_get_inode_loc().  Ah.. it also
> contains the matching error message.  I still don't see how the READA
> buffer reads can affect the synchronous path.  They're doing proper
> exclusion via buffer lock.  Maybe they're getting merged?  Yeap, looks
> like block code is merging READAs and regular READs.
>
> Can you please try the attached patch and reproduce the problem and
> report the kernel log?  Hopefully, this will be the last debug run.
>    

Hi Tejun,

I've recently switched my root partition from OpenSUSE 11.1 to Fedora 11 
and since then I've not again seen the issue. I'm still using vanilla 
2.6.30 generated with the same .config and EXT4 as before, so I have no 
idea why I cannot reproduce the issue. I still use hibernate + sleep 
frequently, and I just checked - I have 5 days uptime with a mount count 
of 20 and the file-system is still clean.

The one big difference is that my original partition was a EXT2 -> EXT3 
-> EXT4 upgrade job over a long period of time, and some of the EXT4 
parameters now used by Fedora 11 on the reformatted root partition are 
different from what I had then. Here is a summary of the differences in 
case it matters at all:

Current settings:
Default mount options:    user_xattr acl
Inodes per group:         8192
Inode blocks per group:   512
Flex block group size:    16
Required extra isize:     28
Desired extra isize:      28
Default directory hash:   half_md4

Previous settings:
Default mount options:    (none)
Inodes per group:         8176
Inode blocks per group:   511
Default directory hash:   tea

If I do notice any such errors again I'll apply the debug patch and let 
you know, but it does seem as if the upgrade made this issue disappear...

Regards,
Niel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ