lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 19 Jun 2024 10:00:00 +0000
From: Khasnis Soumya <soumya.khasnis@...y.com>
To: Daniel Lezcano <daniel.lezcano@...aro.org>, gregkh@...uxfoundation.org
Cc: rafael@...nel.org, linux-kernel@...r.kernel.org,
	daniel.lezcano@...aro.org, festevam@...x.de, lee@...nel.org,
	benjamin.bara@...data.com, dmitry.osipenko@...labora.com,
	ldmldm05@...il.com, soumya.khasnis@...y.com,
	srinavasa.nagaraju@...y.com, Madhusudan.Bobbili@...y.com,
	shingo.takeuchi@...y.com, keita.aihara@...y.com,
	masaya.takahashi@...y.com
Subject: Re: [PATCH v5] driver core: Add timeout for device shutdown

On Thu, Jun 13, 2024 at 01:51:57PM +0200, Daniel Lezcano wrote:
> On 13/06/2024 10:43, Greg KH wrote:
> > On Thu, Jun 13, 2024 at 08:32:26AM +0000, Soumya Khasnis wrote:
> >> The device shutdown callbacks invoked during shutdown/reboot
> >> are prone to errors depending on the device state or mishandling
> >> by one or more driver. In order to prevent a device hang in such
> >> scenarios, we bail out after a timeout while dumping a meaningful
> >> call trace of the shutdown callback to kernel logs, which blocks
> >> the shutdown or reboot process.
> > 
> > Again, this is not a "device shutdown" timeout, it is a "the whole
> > system has not shutdown this fast" timeout.
> > 
> > And in looking at my system, it doesn't shutdown in 10 seconds as it is
> > madly flushing a ton of stuff out to the disks, and they are slow
> > beasts.  So your 10 second default would cause me data loss on my
> > workstation, not good!
> 
> Thanks for pointing this out. It is exactly what I was worried about ...
Thank you for comments Daniel and Greg, let me explain.

Typically reboot/shutdown sequence involves following steps in User land before kernel restart/shutdown sequence is entered.

1.	Terminate all services (except shutdown critical tasks)
2.	Sync File systems
3.	Unmount File systems
4.	Trigger kernel reboot(LINUX_REBOOT_CMD_RESTART/LINUX_REBOOT_CMD_POWER_OFF) system call

A userspace watchdog can be setup for above as exists on Android system.
This needs large timeout value because it involves syncing data to disks.  

Below is the kernel restart sequence after control moves to kernel in step 4).
The issue we intend to address here is that the device driver shutdown callbacks may hang
due to unresponsive device or a broken driver.

|-kernel_restart()
              |- kernel_restart_prepare()
                     |- device_shutdown() // Iterates over the device hierarchy and invokes the shutdown callbacks (class/bus/driver->shutdown)
              |- syscore_shutdown()
              |- machine_restart()

I still believe a 10 sec timeout as default is reasonable for the device_shutdown().
Not all drivers necessarily implement a shutdown callback and the timeout can be configured for large systems as needed.


> 
> [ ... ]
> 
> > Isn't this just a bug in your drivers?  Why not fix them?  Or if you
> > really have to have 10 seconds to shut down, use a watchdog timer that
> > you trigger from userspace and stop petting once you want to shut down.
> > Then, if it expires it will reset the machine, all of your policy
> > decisions would have been done in userspace, no need to get the kernel
> > involved at all.
> 
> +1
> 
> 
> -- 
> <https://urldefense.com/v3/__http://www.linaro.org/__;!!JmoZiZGBv3RvKRSx!_c6dCsrFBbO_ivlpLdqDvkFPd2bIFgHN48Xbjt4dqXVv5_QYeLwNMJOuy_jh5vBfqDUbNuCQ23qnLmHmRRCvtllhT_Uq$ [linaro[.]org]> Linaro.org │ Open source software for ARM SoCs
> 
> Follow Linaro:  <https://urldefense.com/v3/__http://www.facebook.com/pages/Linaro__;!!JmoZiZGBv3RvKRSx!_c6dCsrFBbO_ivlpLdqDvkFPd2bIFgHN48Xbjt4dqXVv5_QYeLwNMJOuy_jh5vBfqDUbNuCQ23qnLmHmRRCvtqiO2qBL$ [facebook[.]com]> Facebook |
> <https://urldefense.com/v3/__http://twitter.com/*!/linaroorg__;Iw!!JmoZiZGBv3RvKRSx!_c6dCsrFBbO_ivlpLdqDvkFPd2bIFgHN48Xbjt4dqXVv5_QYeLwNMJOuy_jh5vBfqDUbNuCQ23qnLmHmRRCvtrJS5bNz$ [twitter[.]com]> Twitter |
> <https://urldefense.com/v3/__http://www.linaro.org/linaro-blog/__;!!JmoZiZGBv3RvKRSx!_c6dCsrFBbO_ivlpLdqDvkFPd2bIFgHN48Xbjt4dqXVv5_QYeLwNMJOuy_jh5vBfqDUbNuCQ23qnLmHmRRCvthplPsVl$ [linaro[.]org]> Blog
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ