[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2024101808-subscribe-unwrapped-ee3d@gregkh>
Date: Fri, 18 Oct 2024 11:37:28 +0200
From: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To: Lukas Wunner <lukas@...ner.de>
Cc: Michael Kelley <mhklinux@...look.com>,
Stuart Hayes <stuart.w.hayes@...il.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"Rafael J . Wysocki" <rafael@...nel.org>,
Martin Belanger <Martin.Belanger@...l.com>,
Oliver O'Halloran <oohall@...il.com>,
Daniel Wagner <dwagner@...e.de>, Keith Busch <kbusch@...nel.org>,
David Jeffery <djeffery@...hat.com>,
Jeremy Allison <jallison@....com>, Jens Axboe <axboe@...com>,
Christoph Hellwig <hch@....de>, Sagi Grimberg <sagi@...mberg.me>,
"linux-nvme@...ts.infradead.org" <linux-nvme@...ts.infradead.org>,
Nathan Chancellor <nathan@...nel.org>,
Jan Kiszka <jan.kiszka@...mens.com>,
Bert Karwatzki <spasswolf@....de>
Subject: Re: [PATCH v9 0/4] shut down devices asynchronously
On Fri, Oct 18, 2024 at 11:14:51AM +0200, Lukas Wunner wrote:
> On Fri, Oct 18, 2024 at 07:49:51AM +0200, Greg Kroah-Hartman wrote:
> > On Fri, Oct 18, 2024 at 03:26:05AM +0000, Michael Kelley wrote:
> > > In the process, the workqueue code spins up additional worker threads
> > > to handle the load. On the Hyper-V VM, 210 to 230 new kernel
> > > threads are created during device_shutdown(), depending on the
> > > timing. On the Pi 5, 253 are created. The max for this workqueue is
> > > WQ_DFL_ACTIVE (256).
> [...]
> > I don't think we can put this type of load on all systems just to handle
> > one specific type of "bad" hardware that takes long periods of time to
> > shutdown, sorry.
>
> Parallelizing shutdown means shorter reboot times, less downtime,
> less cost for CSPs.
For some systems, yes, but as have been seen here, it comes at the
offset of a huge CPU load at shutdown, with sometimes longer reboot
times.
> Modern servers (e.g. Sierra Forest with 288 cores) should handle
> this load easily and may see significant benefits from parallelization.
"may see", can you test this?
> Perhaps a solution is to cap async shutdown based on the number of cores,
> but always use async for certain device classes (e.g. nvme_subsys_class)?
Maybe, but as-is, we can't take the changes this way, sorry. That is a
regression from the situation of working hardware that many people have.
thanks,
greg k-h
Powered by blists - more mailing lists