lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 29 Mar 2017 14:47:54 +0200
From:   Johannes Thumshirn <jthumshirn@...e.de>
To:     Jinpu Wang <jinpu.wang@...fitbricks.com>
Cc:     John Garry <john.garry@...wei.com>,
        "Martin K . Petersen" <martin.petersen@...cle.com>,
        Tejun Heo <tj@...nel.org>,
        James Bottomley <jejb@...ux.vnet.ibm.com>,
        Dan Williams <dan.j.williams@...el.com>,
        Hannes Reinecke <hare@...e.de>,
        Linux SCSI Mailinglist <linux-scsi@...r.kernel.org>,
        Linux Kernel Mailinglist <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/2] scsi: sas: flush destruct workqueue on device
 unregister

On Wed, Mar 29, 2017 at 02:36:11PM +0200, Jinpu Wang wrote:
> On Wed, Mar 29, 2017 at 2:26 PM, Johannes Thumshirn <jthumshirn@...e.de> wrote:
> > On Wed, Mar 29, 2017 at 12:53:28PM +0100, John Garry wrote:
> >> On 29/03/2017 12:29, Johannes Thumshirn wrote:
> >> >On Wed, Mar 29, 2017 at 12:15:44PM +0100, John Garry wrote:
> >> >>On 29/03/2017 10:41, Johannes Thumshirn wrote:
> >> >>>In the advent of an SAS device unregister we have to wait for all destruct
> >> >>>works to be done to not accidently delay deletion of a SAS rphy or it's
> >> >>>children to the point when we're removing the SCSI or SAS hosts.
> >> >>>
> >> >>>Signed-off-by: Johannes Thumshirn <jthumshirn@...e.de>
> >> >>>---
> >> >>>drivers/scsi/libsas/sas_discover.c | 4 ++++
> >> >>>1 file changed, 4 insertions(+)
> >> >>>
> >> >>>diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c
> >> >>>index 60de662..75b18f1 100644
> >> >>>--- a/drivers/scsi/libsas/sas_discover.c
> >> >>>+++ b/drivers/scsi/libsas/sas_discover.c
> >> >>>@@ -382,9 +382,13 @@ void sas_unregister_dev(struct asd_sas_port *port, struct domain_device *dev)
> >> >>>   }
> >> >>>
> >> >>>   if (!test_and_set_bit(SAS_DEV_DESTROY, &dev->state)) {
> >> >>>+          struct sas_discovery *disc = &dev->port->disc;
> >> >>>+          struct sas_work *sw = &disc->disc_work[DISCE_DESTRUCT].work;
> >> >>>+
> >> >>>           sas_rphy_unlink(dev->rphy);
> >> >>>           list_move_tail(&dev->disco_list_node, &port->destroy_list);
> >> >>>           sas_discover_event(dev->port, DISCE_DESTRUCT);
> >> >>>+          flush_work(&sw->work);
> >> >>
> >> >>I quickly tested plugging out the expander and we never get past this call
> >> >>to flush - a hang results:
> >> >
> >> >Can you activat lockdep so we can see which lock it is that we're blocking on?
> >> >
> >>
> >> I have it on:
> >> CONFIG_LOCKDEP_SUPPORT=y
> >> CONFIG_LOCKD=y
> >> CONFIG_LOCKD_V4=y
> >>
> >> >It's most likely in sas_unregister_common_dev() but this function takes two spin
> >> >locks, port->dev_list_lock and ha->lock.
> >> >
> >>
> >> We can see from the callstack I provided that we're working in workqueue
> >> scsi_wq_0 and trying to flush that same queue.
> >
> > Aaahh, now I get what's happening (with some kicks^Whelp from Hannes I admit).
> >
> > The sas_unregister_dev() comes from the work queued by notify_phy_event(). So this patch must be
> > replaced by (untested):
> >
> > diff --git a/drivers/scsi/scsi_transport_sas.c b/drivers/scsi/scsi_transport_sas.c
> > index cdbb293..e1e6492 100644
> > --- a/drivers/scsi/scsi_transport_sas.c
> > +++ b/drivers/scsi/scsi_transport_sas.c
> > @@ -375,6 +375,7 @@ void sas_remove_children(struct device *dev)
> >   */
> >  void sas_remove_host(struct Scsi_Host *shost)
> >  {
> > +       scsi_flush_work(shost);
> >         sas_remove_children(&shost->shost_gendev);
> >  }
> >  EXPORT_SYMBOL(sas_remove_host);
> >
> > John, mind giving that one a shot in your test setup as well?

Well, don't mind. It doesn't work in my test setup.

I'm back to the drawing board...

Anyways thanks,
	Johannes
-- 
Johannes Thumshirn                                          Storage
jthumshirn@...e.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

Powered by blists - more mailing lists