linux-kernel - Re: [PATCH v9 0/4] shut down devices asynchronously

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2024101808-subscribe-unwrapped-ee3d@gregkh>
Date: Fri, 18 Oct 2024 11:37:28 +0200
From: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
To: Lukas Wunner <lukas@...ner.de>
Cc: Michael Kelley <mhklinux@...look.com>,
	Stuart Hayes <stuart.w.hayes@...il.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"Rafael J . Wysocki" <rafael@...nel.org>,
	Martin Belanger <Martin.Belanger@...l.com>,
	Oliver O'Halloran <oohall@...il.com>,
	Daniel Wagner <dwagner@...e.de>, Keith Busch <kbusch@...nel.org>,
	David Jeffery <djeffery@...hat.com>,
	Jeremy Allison <jallison@....com>, Jens Axboe <axboe@...com>,
	Christoph Hellwig <hch@....de>, Sagi Grimberg <sagi@...mberg.me>,
	"linux-nvme@...ts.infradead.org" <linux-nvme@...ts.infradead.org>,
	Nathan Chancellor <nathan@...nel.org>,
	Jan Kiszka <jan.kiszka@...mens.com>,
	Bert Karwatzki <spasswolf@....de>
Subject: Re: [PATCH v9 0/4] shut down devices asynchronously

On Fri, Oct 18, 2024 at 11:14:51AM +0200, Lukas Wunner wrote:
> On Fri, Oct 18, 2024 at 07:49:51AM +0200, Greg Kroah-Hartman wrote:
> > On Fri, Oct 18, 2024 at 03:26:05AM +0000, Michael Kelley wrote:
> > > In the process, the workqueue code spins up additional worker threads
> > > to handle the load.  On the Hyper-V VM, 210 to 230 new kernel
> > > threads are created during device_shutdown(), depending on the
> > > timing. On the Pi 5, 253 are created. The max for this workqueue is
> > > WQ_DFL_ACTIVE (256).
> [...]
> > I don't think we can put this type of load on all systems just to handle
> > one specific type of "bad" hardware that takes long periods of time to
> > shutdown, sorry.
> 
> Parallelizing shutdown means shorter reboot times, less downtime,
> less cost for CSPs.

For some systems, yes, but as have been seen here, it comes at the
offset of a huge CPU load at shutdown, with sometimes longer reboot
times.

> Modern servers (e.g. Sierra Forest with 288 cores) should handle
> this load easily and may see significant benefits from parallelization.

"may see", can you test this?

> Perhaps a solution is to cap async shutdown based on the number of cores,
> but always use async for certain device classes (e.g. nvme_subsys_class)?

Maybe, but as-is, we can't take the changes this way, sorry.  That is a
regression from the situation of working hardware that many people have.

thanks,

greg k-h