lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 9 Jun 2008 17:38:56 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Cornelia Huck <cornelia.huck@...ibm.com>,
	Vegard Nossum <vegard.nossum@...il.com>,
	Adrian Bunk <bunk@...nel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Jens Axboe <jens.axboe@...cle.com>,
	Greg Kroah-Hartman <gregkh@...e.de>,
	"Rafael J. Wysocki" <rjw@...k.pl>,
	Kay Sievers <kay.sievers@...y.org>, Neil Brown <neilb@...e.de>,
	Mariusz Kozlowski <m.kozlowski@...land.pl>,
	Dave Young <hidave.darkstar@...il.com>
Subject: Re: [bug, 2.6.26-rc4/rc5] sporadic bootup crashes in
	blk_lookup_devt()/prepare_namespace()


* Linus Torvalds <torvalds@...ux-foundation.org> wrote:

> On Mon, 9 Jun 2008, Cornelia Huck wrote:
> > 
> > Does this crash happen with the conversion to the class iterator 
> > functions (should be in linux-next) as well? They take the class 
> > mutex...
> 
> I really don't think it's the locking, although I do agree that the 
> locking looks bogus _too_.
> 
> I suspect that the problem is even simpler than that. On the 
> "block_class.devices" list we can have two types of devices: the ones 
> that have been added by the block/genhd.c code (disks: dev->type 
> "disk_type"), and the ones that are added by the class layer for 
> partitions (partitions: dev.type "part_type").
> 
> And *all* the block/genhd.c loops over that device list look like this:
> 
> 	list_for_each_entry(dev, &block_class.devices, node) {
> 		if (dev->type != &disk_type)
> 			continue;
> 		sgp = dev_to_disk(dev);
> 		...
> 
> because you cannot do that "dev_to_disk()" on a partition entry (it 
> won't have a container of type gendisk, it will be of type hd_struct).
> 
> Well, all except one. Guess which one..
> 
> So I suspect that (a) yes, we need to fix the locking, but (b) the fix for 
> this particular bug is probably the trivial one appended.
> 
> And yes, this bug was introduced by commit 30f2f0eb4b ("block: 
> do_mounts - accept root=<non-existant partition>"), so the alternative 
> is to revert it entirely. Kay?

ah. I suspect that explains the sporadic nature as well: normally there 
is 'some' object at the list address, just with an invalid type.

The invalid type only gets visible as a hard crash if due to PAGEALLOC 
the structure sizes and kmalloc/slab details cause the invalid access to 
go to a not yet allocated page. (and then it crashes there)

And that in itself is a rather unlikely and fragile condition (it might 
even depend on timings of various allocations), that's why the bug wasnt 
really reproducible deterministically.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ