lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4CBFE4E2.7050001@kernel.dk>
Date:	Thu, 21 Oct 2010 08:59:46 +0200
From:	Jens Axboe <axboe@...nel.dk>
To:	Theodore Ts'o <tytso@....edu>
CC:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>
Subject: Re: What am I doing wrong?  submit_bio() suddenly stops working...

On 2010-10-21 04:00, Theodore Ts'o wrote:
> Hey Jens,
> 
> I've been trying to figure out what I'm doing wrong.  I've been trying
> to convert the data writeback bath to use the bio layer.  It mostly
> works --- until all of sudden all calls to block_bio_queue(), either via
> submit_bh() or via submit_bio(), start turning into no-ops.
> 
> I'm sure I'm doing something wrong, but the bio layer isn't terribly
> well documented, so I'm not sure what it might be.  The patch which
> causes the problem can be found be found here:
> 
> http://userweb.kernel.org/~tytso/ext4-bio-patches/0006-Ext4-Use-bio-layer-instead-of-buffer-layer-in-mpage_.patch
> 
> Here is an except from an ftrace I've been taking to get to the bottom
> of it.  It's a combination of some trace_printk's, blktrace, and the
> block_bio_queue tracepoint.   The full log can be found at:
> 
> http://userweb.kernel.org/~tytso/ext4-bio-patches/kvm-console
> 
> It shows all of the blktrace events that shows up after block_bio_queue
> tracepoint, but at some point, after jbd2 or ext4 calls submit_bh() or
> submit_bio(), after the block_bio_queue tracepoint, we stop seeing the
> blktrace events, and it looks like the block I/O layer stops answering
> the phone.  No complaints in dmesg, no BUG_ON's, no errors....
> 
> If I back out the ext4 bio patches, things work correctly, and as I
> said, I'm pretty sure the bug is in my code.  But the failure is
> happening deep in the block I/O stack, and I can't figure out why it's
> failing.
> 
> I'm hoping this rings a bell, and perhaps we should consider some of the
> debugging trace_printk's as possible new tracepoints?
> 
> Any help you could give me would be greatly appreciated.  Ideally, you
> or someone can tell me what stupid thing I'm doing.  :-)

I don't see anything immediately wrong with your approach. I suspect
we'll need to see sysrq-t traces of the relevant processes to make a
more educated guess!

-- 
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ