lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 4 Mar 2010 11:09:01 -0800
From:	"Hugh Daschbach" <hdasch@...adcom.com>
To:	"Alan Stern" <stern@...land.harvard.edu>
cc:	"Greg KH" <gregkh@...e.de>, "Kay Sievers" <kay.sievers@...y.org>,
	"Jan Blunck" <jblunck@...e.de>,
	"David Vrabel" <david.vrabel@....com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-scsi@...r.kernel.org" <linux-scsi@...r.kernel.org>,
	"james Bottomley" <James.Bottomley@...e.de>,
	"James Smart" <james.smart@...lex.com>
Subject: RE: System reboot hangs due to race against devices_kset->list
 triggered by SCSI FC workqueue

Alan Stern [mailto:stern@...land.harvard.edu] writes:

> On Wed, 3 Mar 2010, Hugh Daschbach wrote:
>
>> Alan Stern [mailto:stern@...land.harvard.edu] writes:
>> 
>> > On Wed, 3 Mar 2010, Hugh Daschbach wrote:
>> >
>> >> > Can't we just protect the list?  What is wanting to write to the list
>> >> > while shutdown is happening?
>> >> 
>> >> Indeed, Alan suggested holding the kset spinlock while iterating the
...
>> > What I meant was that you should hold the spinlock while finding and 
>> > unlinking the last device on the list.  Clearly you shouldn't hold it 
>> > while calling the device shutdown routine.
>> 
>> I misunderstood.  But I believe insertion and deletion is properly
>> serliaized.  It looks to me like the list structure is intact.  It's the
>> iterator that's been driven off into the weeds.
...
>> Just to be clear, the list we're talking about is "list" in "struct
>> kset"  And the nodes of the list are chained by "entry" in "struct
>> kobject".
...
>> At a minimum the change looks something like the patch below.
...
> If you really want to do this then you should remove the lock member 
> from struct kset.  However this seems like an awful lot of work 
> compared to my original suggestion -- something like this (untested, 
> and you'll want to add comments):
...

I'm not sure I do want to pursue this.  It does seem particularly
invasive at a fundamental level of a core data structure.

Apparently I still don't understand your original suggestion.  I'd
prefer to, especially if it leads to a simpler fix.  The loop in
device_shutdown() looks something like:

       struct device *dev, *devn;

        list_for_each_entry_safe_reverse(dev, devn, &devices_kset->list,
                                kobj.entry) {
                if (dev->bus && dev->bus->shutdown) {
                        dev->bus->shutdown(dev);
                } else if (dev->driver && dev->driver->shutdown) {
                        dev->driver->shutdown(dev);
                }
        }

*dev gets delinked kobj_kset_leave() indirectly called from
dev->*->shutdown(dev).  This is protected by the spinlock.

The secondary thread similarly calls kobj_kset_leave().  But when the
secondary thread calls the shutdown routine for the device that devn
points to, the loop hangs.

Is there some way I can detect that devn no longer points to a valid
device upon return from dev->*->shutdown(dev)?  Or, where else can I
look to better understand your suggestion?

Thanks,
Hugh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ