lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
 <BN7PR02MB41480DE777B9C224F3C2DF43D4792@BN7PR02MB4148.namprd02.prod.outlook.com>
Date: Fri, 11 Oct 2024 04:22:19 +0000
From: Michael Kelley <mhklinux@...look.com>
To: Stuart Hayes <stuart.w.hayes@...il.com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, Greg Kroah-Hartman
	<gregkh@...uxfoundation.org>, "Rafael J . Wysocki" <rafael@...nel.org>,
	Martin Belanger <Martin.Belanger@...l.com>, Oliver O'Halloran
	<oohall@...il.com>, Daniel Wagner <dwagner@...e.de>, Keith Busch
	<kbusch@...nel.org>, Lukas Wunner <lukas@...ner.de>, David Jeffery
	<djeffery@...hat.com>, Jeremy Allison <jallison@....com>, Jens Axboe
	<axboe@...com>, Christoph Hellwig <hch@....de>, Sagi Grimberg
	<sagi@...mberg.me>, "linux-nvme@...ts.infradead.org"
	<linux-nvme@...ts.infradead.org>, Nathan Chancellor <nathan@...nel.org>, Jan
 Kiszka <jan.kiszka@...mens.com>, Bert Karwatzki <spasswolf@....de>
Subject: RE: [PATCH v9 0/4] shut down devices asynchronously

From: Stuart Hayes <stuart.w.hayes@...il.com> Sent: Wednesday, October 9, 2024 10:58 AM
> 
> This adds the ability for the kernel to shutdown devices asynchronously.
> 
> Only devices with drivers that enable it are shut down asynchronously.
> 
> This can dramatically reduce system shutdown/reboot time on systems that
> have multiple devices that take many seconds to shut down (like certain
> NVMe drives). On one system tested, the shutdown time went from 11 minutes
> without this patch to 55 seconds with the patch.

I've been testing this series against a 6.11.0 kernel in an Azure VM, which
is running as a guest on the Microsoft Hyper-V hypervisor. The VM has 16 vCPUs,
128 Gbytes of memory, and two physical NVMe controllers that are mapped
into the VM so that it can access them directly.

But I wanted to confirm that the two NVMe controllers are being shutdown
in parallel.  So before doing a shutdown, I set
/sys/module/kernel/parameters/initcall_debug to "Y" so the shutdown
of each device is recorded in the console output.  Here's the full set of
device shutdown messages:

172.609825	platform intel_rapl_msr.0: shutdown
172.611940	mlx5_ib.rdma mlx5_core.rdma.0: shutdown
172.613931	mlx5_core.eth mlx5_core.eth.0: shutdown
172.618116	nvme c2b7:00:00.0: shutdown
172.618262	nvme 132e:00:00.0: shutdown
172.618349	mlx5_core 1610:00:02.0: shutdown
172.618359	mlx5_core 1610:00:02.0: Shutdown was called
172.782768	hv_pci ba152dae-1610-4c67-b925-81ac4902e4ce: shutdown
172.786405	sd 0:0:0:1: shutdown
172.788788	sd 0:0:0:0: shutdown
172.789949	sd 0:0:0:0: [sda] Synchronizing SCSI cache
172.794209	atkbd serio0: shutdown
172.795974	hv_utils 242ff919-07db-4180-9c2e-b86cb68c8c55: shutdown
172.800432	hv_pci 0cdfe983-132e-434b-8025-fc9ab43c0fc5: shutdown
172.802812	hv_pci 2394da4f-c2b7-43bd-b72f-d3482ef6850a: shutdown
172.805145	hv_netvsc 0022487e-1043-0022-487e-10430022487e: shutdown
172.807575	hyperv_fb 5620e0c7-8062-4dce-aeb7-520c7ef76171: shutdown
172.810026	hyperv_keyboard d34b2567-b9b6-42b9-8778-0a4ec0b955bf: shutdown
172.812522	hid_hyperv 58f75a6d-d949-4320-99e1-a2a2576d581c: shutdown
172.814982	hv_balloon 1eccfd72-4b41-45ef-b73a-4a6e44c12924: shutdown
172.817376	vmbus c4e5e7d1-d748-4afc-979d-683167910a55: shutdown
172.819789	hv_storvsc f8b3781b-1e82-4818-a1c3-63d806ec15bb: shutdown
172.822324	hv_storvsc f8b3781a-1e82-4818-a1c3-63d806ec15bb: shutdown
172.824813	hv_utils 2dd1ce17-079e-403c-b352-a1921ee207ee: shutdown
172.827199	hv_utils b6650ff7-33bc-4840-8048-e0676786f393: shutdown
172.829653	hv_utils fd149e91-82e0-4a7d-afa6-2a4166cbd7c0: shutdown
172.836408	platform eisa.0: shutdown
172.838558	alarmtimer alarmtimer.0.auto: shutdown
172.842461	platform Fixed MDIO bus.0: shutdown
172.864709	kgdboc kgdboc: shutdown
172.878009	serial8250 serial8250: shutdown
172.889725	platform pcspkr: shutdown
172.904386	rtc_cmos 00:02: shutdown
172.906217	serial 00:01: shutdown
172.907799	serial 00:00: shutdown
172.910427	platform efivars.0: shutdown
172.913341	platform rtc-efi.0: shutdown
172.915470	vmgenid HYPER_V_GEN_COUNTER_V1:00: shutdown
172.917479	vmbus VMBUS:00: shutdown
172.919012	platform PNP0003:00: shutdown
172.926707	reg-dummy reg-dummy: shutdown
172.961360	ACPI: PM: Preparing to enter system sleep state S5

You see the Mellanox CX-5 NIC, the two NVMe devices, various Hyper-V
virtual devices, and platform devices being shutdown. Everything seems to
work properly, so that's good. The two NVMe devices are shutdown very
close in time, so they are probably being done in parallel.

As a comparison, I did the same thing with an unmodified 6.11.0 kernel.
Indeed, the NVMe device shutdowns are significantly more apart in time
(110 milliseconds). That's not noticeably slow like the NVMe devices you
were dealing with, but doing them in parallel helps a little bit.

But here's the kicker:  The overall process of shutting down the devices
took *longer* with the patch set than without.  Here's the same output
from a 6.11.0 kernel without the patch set:

745.455493	platform intel_rapl_msr.0: shutdown		
745.456999	mlx5_ib.rdma mlx5_core.rdma.0: shutdown		
745.458557	mlx5_core.eth mlx5_core.eth.0: shutdown		
745.460166	mlx5_core 1610:00:02.0: shutdown		
745.461570	mlx5_core 1610:00:02.0: Shutdown was called		
745.466053	nvme 132e:00:00.0: shutdown		
745.579284	nvme c2b7:00:00.0: shutdown		
745.718739	hv_pci ba152dae-1610-4c67-b925-81ac4902e4ce: shutdown		
745.721114	sd 0:0:0:1: shutdown		
745.722254	sd 0:0:0:0: shutdown		
745.723357	sd 0:0:0:0: [sda] Synchronizing SCSI cache		
745.725259	atkbd serio0: shutdown		
745.726405	hv_utils 242ff919-07db-4180-9c2e-b86cb68c8c55: shutdown		
745.728375	hv_pci 0cdfe983-132e-434b-8025-fc9ab43c0fc5: shutdown		
745.730347	hv_pci 2394da4f-c2b7-43bd-b72f-d3482ef6850a: shutdown		
745.732281	hv_netvsc 0022487e-1043-0022-487e-10430022487e: shutdown		
745.734318	hyperv_fb 5620e0c7-8062-4dce-aeb7-520c7ef76171: shutdown		
745.736488	hyperv_keyboard d34b2567-b9b6-42b9-8778-0a4ec0b955bf: shutdown		
745.738628	hid_hyperv 58f75a6d-d949-4320-99e1-a2a2576d581c: shutdown		
745.740770	hv_balloon 1eccfd72-4b41-45ef-b73a-4a6e44c12924: shutdown		
745.742835	vmbus c4e5e7d1-d748-4afc-979d-683167910a55: shutdown		
745.744765	hv_storvsc f8b3781b-1e82-4818-a1c3-63d806ec15bb: shutdown		
745.746861	hv_storvsc f8b3781a-1e82-4818-a1c3-63d806ec15bb: shutdown		
745.748907	hv_utils 2dd1ce17-079e-403c-b352-a1921ee207ee: shutdown		
745.750948	hv_utils b6650ff7-33bc-4840-8048-e0676786f393: shutdown		
745.753012	hv_utils fd149e91-82e0-4a7d-afa6-2a4166cbd7c0: shutdown		
745.755000	platform eisa.0: shutdown		
745.756266	alarmtimer alarmtimer.0.auto: shutdown		
745.757868	platform Fixed MDIO bus.0: shutdown		
745.759447	kgdboc kgdboc: shutdown		
745.760679	serial8250 serial8250: shutdown		
745.762110	platform pcspkr: shutdown		
745.763387	rtc_cmos 00:02: shutdown		
745.764726	serial 00:01: shutdown		
745.765898	serial 00:00: shutdown		
745.767036	platform efivars.0: shutdown		
745.768783	platform rtc-efi.0: shutdown		
745.770240	vmgenid HYPER_V_GEN_COUNTER_V1:00: shutdown		
745.771949	vmbus VMBUS:00: shutdown		
745.773197	platform PNP0003:00: shutdown		
745.774540	reg-dummy reg-dummy: shutdown		
745.775964	ACPI: PM: Preparing to enter system sleep state S5
		
There's some modest variability in the individual steps, but the 110 ms
saved on the NVMe device seems to be given back on some other
devices. I did the comparison twice with similar results. (I have the
full data set with comparisons in an Excel spreadsheet.)

Any thoughts on what might be causing this?  I haven't gone into the
details of your algorithms for parallelizing, but is there any extra
overhead that could be adding to the time? Or maybe this is
something unique to Hyper-V guests. The overall difference is only
a few 10's of milliseconds, so not that big of a deal. But maybe it's
an indicator that something unexpected is happening that we should
understand.

I'll keep thinking about the issue and see if I can get any more insight.

Michael Kelley

> 
> Changes from V8:
> 
> Deal with shutdown hangs resulting when a parent/supplier device is
>   later in the devices_kset list than its children/consumers:
>   * Ignore sync_state_only devlinks for shutdown dependencies
>   * Ignore shutdown_after for devices that don't want async shutdown
>   * Add a sanity check to revert to sync shutdown for any device that
>     would otherwise wait for a child/consumer shutdown that hasn't
>     already been scheduled
> 
> Changes from V7:
> 
> Do not expose driver async_shutdown_enable in sysfs.
> Wrapped a long line.
> 
> Changes from V6:
> 
> Removed a sysfs attribute that allowed the async device shutdown to be
> "on" (with driver opt-out), "safe" (driver opt-in), or "off"... what was
> previously "safe" is now the only behavior, so drivers now only need to
> have the option to enable or disable async shutdown.
> 
> Changes from V5:
> 
> Separated into multiple patches to make review easier.
> Reworked some code to make it more readable
> Made devices wait for consumers to shut down, not just children
>   (suggested by David Jeffery)
> 
> Changes from V4:
> 
> Change code to use cookies for synchronization rather than async domains
> Allow async shutdown to be disabled via sysfs, and allow driver opt-in or
>   opt-out of async shutdown (when not disabled), with ability to control
>   driver opt-in/opt-out via sysfs
> 
> Changes from V3:
> 
> Bug fix (used "parent" not "dev->parent" in device_shutdown)
> 
> Changes from V2:
> 
> Removed recursive functions to schedule children to be shutdown before
>   parents, since existing device_shutdown loop will already do this
> 
> Changes from V1:
> 
> Rewritten using kernel async code (suggested by Lukas Wunner)
> 
> 
> Stuart Hayes (4):
>   driver core: don't always lock parent in shutdown
>   driver core: separate function to shutdown one device
>   driver core: shut down devices asynchronously
>   nvme-pci: Make driver prefer asynchronous shutdown
> 
>  drivers/base/base.h           |   4 +
>  drivers/base/core.c           | 137 +++++++++++++++++++++++++++-------
>  drivers/nvme/host/pci.c       |   1 +
>  include/linux/device/driver.h |   2 +
>  4 files changed, 118 insertions(+), 26 deletions(-)
> 
> --
> 2.39.3
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ