lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 25 Jun 2009 21:57:38 +0900
From:	Tejun Heo <htejun@...il.com>
To:	Niel Lambrechts <niel.lambrechts@...il.com>
CC:	Alan Cox <alan@...rguk.ukuu.org.uk>,
	"linux.kernel" <linux-kernel@...r.kernel.org>,
	Theodore Tso <tytso@....edu>
Subject: Re: 2.6.29 regression: ATA bus errors on resume

Sorry about the long delay.

Niel Lambrechts wrote:
> Morning Tejun,
> 
> Tejun Heo wrote:
>> Hello,
>>
>> Can you please do the followings?
>>
>> 1. Apply the attached patch, build & boot
>>   
> I chose 2.6.30-rc7...
>> 2. Trigger the problem and record dmesg
>>   
> It took 3 days and quite a few hibernate attempts ... :-)
> 
>> 3. On failed IO, the kernel will print the address of bi_endio.  Run
>>    "nm -n" on the vmlinux in the kernel build root and look up which
>>    function it is and post the dmesg and function name.
> I did not have that specific vmlinux.o file any more, but
> /boot/System.map-2.6.30-rc7-pae shows:
> c01a49fd t end_bio_bh_io_sync

So, it's coming from submit_bh()

> Hope this is sufficient to help you. Sorry if this is silly - being so
> inexperienced with the kernel - but I wondered if or why a dump_stack()
> in that debug patch would not be helpful?

The result is perfectly good and yeah dump_stack() on the issue path
would help but the problem is that block IO requests are processed
asynchronously so by the time we find out which request fail, the
requester stack is long gone.  We can either record the stack trace
with each request or trace it back one step at a time by chasing down
the completion callbacks.  The first requires more coding, so... :-)

Looks like the request gotta be coming from __breadahead().  The only
place this is used in ext4 is in __ext4_get_inode_loc().  Ah.. it also
contains the matching error message.  I still don't see how the READA
buffer reads can affect the synchronous path.  They're doing proper
exclusion via buffer lock.  Maybe they're getting merged?  Yeap, looks
like block code is merging READAs and regular READs.

Can you please try the attached patch and reproduce the problem and
report the kernel log?  Hopefully, this will be the last debug run.

Thanks.

-- 
tejun

View attachment "bio_endio-debug2.patch" of type "text/x-patch" (1341 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ