[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <233671224A0FED4688218FFDBED26E1A5181BF12D1@IRVEXCHCCR01.corp.ad.broadcom.com>
Date: Thu, 4 Mar 2010 11:09:01 -0800
From: "Hugh Daschbach" <hdasch@...adcom.com>
To: "Alan Stern" <stern@...land.harvard.edu>
cc: "Greg KH" <gregkh@...e.de>, "Kay Sievers" <kay.sievers@...y.org>,
"Jan Blunck" <jblunck@...e.de>,
"David Vrabel" <david.vrabel@....com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-scsi@...r.kernel.org" <linux-scsi@...r.kernel.org>,
"james Bottomley" <James.Bottomley@...e.de>,
"James Smart" <james.smart@...lex.com>
Subject: RE: System reboot hangs due to race against devices_kset->list
triggered by SCSI FC workqueue
Alan Stern [mailto:stern@...land.harvard.edu] writes:
> On Wed, 3 Mar 2010, Hugh Daschbach wrote:
>
>> Alan Stern [mailto:stern@...land.harvard.edu] writes:
>>
>> > On Wed, 3 Mar 2010, Hugh Daschbach wrote:
>> >
>> >> > Can't we just protect the list? What is wanting to write to the list
>> >> > while shutdown is happening?
>> >>
>> >> Indeed, Alan suggested holding the kset spinlock while iterating the
...
>> > What I meant was that you should hold the spinlock while finding and
>> > unlinking the last device on the list. Clearly you shouldn't hold it
>> > while calling the device shutdown routine.
>>
>> I misunderstood. But I believe insertion and deletion is properly
>> serliaized. It looks to me like the list structure is intact. It's the
>> iterator that's been driven off into the weeds.
...
>> Just to be clear, the list we're talking about is "list" in "struct
>> kset" And the nodes of the list are chained by "entry" in "struct
>> kobject".
...
>> At a minimum the change looks something like the patch below.
...
> If you really want to do this then you should remove the lock member
> from struct kset. However this seems like an awful lot of work
> compared to my original suggestion -- something like this (untested,
> and you'll want to add comments):
...
I'm not sure I do want to pursue this. It does seem particularly
invasive at a fundamental level of a core data structure.
Apparently I still don't understand your original suggestion. I'd
prefer to, especially if it leads to a simpler fix. The loop in
device_shutdown() looks something like:
struct device *dev, *devn;
list_for_each_entry_safe_reverse(dev, devn, &devices_kset->list,
kobj.entry) {
if (dev->bus && dev->bus->shutdown) {
dev->bus->shutdown(dev);
} else if (dev->driver && dev->driver->shutdown) {
dev->driver->shutdown(dev);
}
}
*dev gets delinked kobj_kset_leave() indirectly called from
dev->*->shutdown(dev). This is protected by the spinlock.
The secondary thread similarly calls kobj_kset_leave(). But when the
secondary thread calls the shutdown routine for the device that devn
points to, the loop hangs.
Is there some way I can detect that devn no longer points to a valid
device upon return from dev->*->shutdown(dev)? Or, where else can I
look to better understand your suggestion?
Thanks,
Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists