[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080610030959.GD6796@suse.de>
Date: Mon, 9 Jun 2008 20:09:59 -0700
From: Greg KH <gregkh@...e.de>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Ingo Molnar <mingo@...e.hu>,
Cornelia Huck <cornelia.huck@...ibm.com>,
Vegard Nossum <vegard.nossum@...il.com>,
Adrian Bunk <bunk@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Jens Axboe <jens.axboe@...cle.com>,
"Rafael J. Wysocki" <rjw@...k.pl>,
Kay Sievers <kay.sievers@...y.org>, Neil Brown <neilb@...e.de>,
Mariusz Kozlowski <m.kozlowski@...land.pl>,
Dave Young <hidave.darkstar@...il.com>
Subject: Re: [bug, 2.6.26-rc4/rc5] sporadic bootup crashes in
blk_lookup_devt()/prepare_namespace()
On Mon, Jun 09, 2008 at 09:15:40AM -0700, Linus Torvalds wrote:
>
>
> On Mon, 9 Jun 2008, Ingo Molnar wrote:
> >
> > ah. I suspect that explains the sporadic nature as well: normally there
> > is 'some' object at the list address, just with an invalid type.
>
> Yes. It could cause two kinds of problems:
>
> - it might end up returning the wrong 'dev_t'. This is unlikely, since we
> only have two cases: the working whole-disk case, and the case where we
> find a partition.
>
> But if we find a partition, we'd still get the right dev_t *most* of
> the time, because we'd first get called with "part=0", and then we have
>
> if (part < disk->minors)
> devt = MKDEV(MAJOR(dev->devt),
> MINOR(dev->devt) + part);
> break;
>
> where we would only fail if that conditional statement would be untrue
> (and then we'd incorrectly return MKDEV(0,0)). Otherwise, 'devt' ends
> up being correct anyway.
>
> So one effect of this bug would be that it would use the random
> "disk->minors" value to either return the right devt, or return one
> that is all zeroes. But if we return the all-zeroes case, then
> init/do_mounts.c will just try again, this time with the numbers
> removed, and now it wouldn't hit the "strcmp()" on any partition, and
> the next time around it would find a disk and work again.
>
> So this is a bug, but it's one that essentially is hidden by the
> caller.
>
> - The other alternative is that the bogus "disk->minors" thing would
> cause a page fault. This would only happen if the partition allocation
> was the first thing in a page, and the previous page was unused, and
> you had DEBUG_PAGEALLOC enabled.
>
> This is obviously the case you saw.
>
> My trivial fix makes it ignore partitions entirely.
>
> We *could* (and perhaps should) do something slightly more involved
> instead, which actually uses a partition if it's there). Like this. That
> would avoid my one nagging worry (that some clever usage makes partitions
> with a different numbering or without a base block device).
>
> And this is all still ignoring the locking issue, of course. It would be
> trivial to just remove the block_class_lock, and change
>
> mutex_[un]lock(&block_class_lock);
>
> into
>
> down|up(&block_class.sem);
The locking for struct class has turned into a mutex in the -next tree
already, but I have left the block_class_lock alone for the moment.
Now that I have also cleaned up the places in the /proc files where we
grabbed it, I think it might be safe to remove, I'll poke at that
tomorrow.
thanks,
greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists