lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.2.02.1105041124080.3005@ionos>
Date:	Wed, 4 May 2011 11:52:44 +0200 (CEST)
From:	Thomas Gleixner <tglx@...utronix.de>
To:	Ingo Molnar <mingo@...e.hu>
cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Jens Axboe <axboe@...nel.dk>,
	Andrew Morton <akpm@...ux-foundation.org>,
	werner <w.landgraf@...ru>, "H. Peter Anvin" <hpa@...or.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [block IO crash] Re: 2.6.39-rc5-git2 boot crashs

On Wed, 4 May 2011, Ingo Molnar wrote:

> 1415		if (!nr_sectors)
> 1416			return 0;
> 1417	
> 1418		/* Test device or partition size, when known. */
> 1419		maxsector = i_size_read(bio->bi_bdev->bd_inode) >> 9;   <==== [ **CRASH** ]
> 1420		if (maxsector) {
> 1421			sector_t sector = bio->bi_sector;
> 1422	
> 1423			if (maxsector < nr_sectors || maxsector - nr_sectors < sector) {
> 
> bio->bi_bdev has become NULL?
> 
> I do not think the _cond_resched() was called, judging from stack contents. But 
> we just had an IRQ:
> 
>  [<c1d74030>] ? common_interrupt+0x30/0x40
> 
> So we might have raced with block IO IRQ queue-completion/submission activites.
> 
> But maybe it was a reschedule after all, just the stack does not carry any 
> traces of it anymore. IRQs do not clear ->bi_bdev, right? Unless the bio 
> refcounts are wrong and an IRQ's completion actually frees the bio, right?

Looking at the call chain that's impossible:

generic_make_request
submit_bio
submit_bh

submit_bh does:

	bio = bio_alloc()
	bio_get(bio)
	submit_bio(bio)
	bio_put(bio)

So that bio is not yet known to anything else than the calling
code. 

One possibility is that bh->bdev is NULL when submit_bh() is called,
which I think is rather unlikely, but can be easily verified with

--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2887,6 +2887,7 @@ int submit_bh(int rw, struct buffer_head * bh)
 	BUG_ON(!bh->b_end_io);
 	BUG_ON(buffer_delay(bh));
 	BUG_ON(buffer_unwritten(bh));
+	BUG_ON(!bh->b_bdev);
 
 	/*
 	 * Only clear out a write error when rewriting

But I rather suspect, that CONFIG_SLUB=y is the thing we need to look
at. The lockless fastpath cmpxchg comes to my mind.

Either we generate broken code with that ELAN caused options or
that combo triggers some hidden problem in SLUB.

Thanks,

	tglx


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ