[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <SJ0PR19MB4544C3854D2C68853A6E8EA8F21F9@SJ0PR19MB4544.namprd19.prod.outlook.com>
Date: Wed, 30 Mar 2022 14:12:18 +0000
From: "Belanger, Martin" <Martin.Belanger@...l.com>
To: Oliver O'Halloran <oohall@...il.com>,
Tanjore Suresh <tansuresh@...gle.com>
CC: Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
"Rafael J . Wysocki" <rafael@...nel.org>,
Christoph Hellwig <hch@....de>,
Sagi Grimberg <sagi@...mberg.me>,
Bjorn Helgaas <bhelgaas@...gle.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
"linux-nvme@...ts.infradead.org" <linux-nvme@...ts.infradead.org>,
linux-pci <linux-pci@...r.kernel.org>
Subject: RE: [PATCH v1 1/3] driver core: Support asynchronous driver shutdown
> From: Linux-nvme <linux-nvme-bounces@...ts.infradead.org> On Behalf Of
> Oliver O'Halloran
> Sent: Monday, March 28, 2022 8:20 PM
> To: Tanjore Suresh
> Cc: Greg Kroah-Hartman; Rafael J . Wysocki; Christoph Hellwig; Sagi Grimberg;
> Bjorn Helgaas; Linux Kernel Mailing List; linux-nvme@...ts.infradead.org; linux-
> pci
> Subject: Re: [PATCH v1 1/3] driver core: Support asynchronous driver shutdown
>
>
> On Tue, Mar 29, 2022 at 10:35 AM Tanjore Suresh <tansuresh@...gle.com>
> wrote:
> >
> > This changes the bus driver interface with additional entry points to
> > enable devices to implement asynchronous shutdown. The existing
> > synchronous interface to shutdown is unmodified and retained for
> > backward compatibility.
> >
> > This changes the common device shutdown code to enable devices to
> > participate in asynchronous shutdown implementation.
>
> nice to see someone looking at improving the shutdown path
Agreed!
I know this patch is mainly for PCI devices, however, NVMe over Fabrics
devices can suffer even longer shutdowns. Last September, I reported
that shutting down an NVMe-oF TCP connection while the network is down
will result in a 1-minute deadlock. That's because the driver tries to perform
a proper shutdown by sending commands to the remote target and the
timeout for unanswered commands is 1-minute. If one needs to shut down
several NVMe-oF connections, each connection will be shut down sequentially
taking each 1 minute. Try running "nvme disconnect-all" while the network
is down and you'll see what I mean. Of course, the KATO is supposed to
detect when connectivity is lost, but if you have a long KATO (e.g. 2 minutes)
you will most likely hit this condition.
Here's the patch I proposed in September, which shortens the timeout to
5 sec on a disconnect.
http://lists.infradead.org/pipermail/linux-nvme/2021-September/027867.html
Regards,
Martin Belanger
>
> > Signed-off-by: Tanjore Suresh <tansuresh@...gle.com>
> > ---
> > drivers/base/core.c | 39 +++++++++++++++++++++++++++++++++++++-
> > include/linux/device/bus.h | 10 ++++++++++
> > 2 files changed, 48 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/base/core.c b/drivers/base/core.c index
> > 3d6430eb0c6a..359e7067e8b8 100644
> > --- a/drivers/base/core.c
> > +++ b/drivers/base/core.c
> > @@ -4479,6 +4479,7 @@ EXPORT_SYMBOL_GPL(device_change_owner);
> > *snip*
>
> This all seems a bit dangerous and I'm wondering what systems you've tested
> these changes with. I had a look at implementing something similar a few years
> ago and one case that always concerned me was embedded systems where the
> PCIe root complex also has a driver bound.
> Say you've got the following PCIe topology:
>
> 00:00.0 - root port
> 01:00.0 - nvme drive
>
> With the current implementation of device_shutdown() we can guarantee that
> the child device (the nvme) is shut down before we start trying to shut down the
> parent device (the root complex) so there's no possibility of deadlocks and
> other dependency headaches. With this implementation of async shutdown we
> lose that guarantee and I'm not sure what the consequences are. Personally I
> was never able to convince myself it was safe, but maybe you're braver than I
> am :)
>
> That all said, there's probably only a few kinds of device that will really want to
> implement async shutdown support so maybe you can restrict it to leaf devices
> and flip the ordering around to something
> like:
>
> for_each_device(dev) {
> if (can_async(dev) && has_no_children(dev))
> start_async_shutdown(dev)
> }
> wait_for_all_async_shutdowns_to_finish()
>
> // tear down the remaining system devices synchronously
> for_each_device(dev)
> do_sync_shutdown(dev)
>
> > /*
> > diff --git a/include/linux/device/bus.h b/include/linux/device/bus.h
> > index a039ab809753..e261819601e9 100644
> > --- a/include/linux/device/bus.h
> > +++ b/include/linux/device/bus.h
> > @@ -93,6 +101,8 @@ struct bus_type {
> > void (*sync_state)(struct device *dev);
> > void (*remove)(struct device *dev);
> > void (*shutdown)(struct device *dev);
> > + void (*shutdown_pre)(struct device *dev);
> > + void (*shutdown_post)(struct device *dev);
>
> Call them shutdown_async_start() / shutdown_async_end() or something IMO.
> These names are not at all helpful and they're easy to mix up their role with
> the class based shutdown_pre / _post
Powered by blists - more mailing lists