lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2024052927-traffic-lazy-e3ad@gregkh>
Date: Wed, 29 May 2024 14:51:48 +0200
From: Greg KH <gregkh@...uxfoundation.org>
To: Soumya Khasnis <soumya.khasnis@...y.com>
Cc: rafael@...nel.org, linux-kernel@...r.kernel.org,
	daniel.lezcano@...aro.org, festevam@...x.de, lee@...nel.org,
	benjamin.bara@...data.com, dmitry.osipenko@...labora.com,
	ldmldm05@...il.com, srinavasa.nagaraju@...y.com,
	Madhusudan.Bobbili@...y.com, shingo.takeuchi@...y.com,
	keita.aihara@...y.com, masaya.takahashi@...y.com
Subject: Re: [PATCH v2] reboot: Add timeout for device shutdown

On Wed, May 29, 2024 at 11:00:49AM +0000, Soumya Khasnis wrote:
> The device shutdown callbacks invoked during shutdown/reboot
> are prone to errors depending on the device state or mishandling
> by one or more driver.

Why not fix those drivers?  A release callback should not stall, and if
it does, that's a bug that should be fixed there.

Or use a watchdog and just reboot if that triggers at shutdown time.

> In order to prevent a device hang in such
> scenarios, we bail out after a timeout while dumping a meaningful
> call trace of the shutdown callback which blocks the shutdown or
> reboot process.

Dump it where?


> 
> Signed-off-by: Soumya Khasnis <soumya.khasnis@...y.com>
> Signed-off-by: Srinavasa Nagaraju <Srinavasa.Nagaraju@...y.com>
> ---
>  drivers/base/Kconfig | 15 +++++++++++++++
>  kernel/reboot.c      | 46 +++++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 60 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
> index 2b8fd6bb7da0..d06e379b6281 100644
> --- a/drivers/base/Kconfig
> +++ b/drivers/base/Kconfig
> @@ -243,3 +243,18 @@ config FW_DEVLINK_SYNC_STATE_TIMEOUT
>  	  work on.
>  
>  endmenu
> +
> +config DEVICE_SHUTDOWN_TIMEOUT
> +	bool "device shutdown timeout"
> +	default n

That is the default, so no need for this.


> +	help
> +	   Enable timeout for device shutdown. Helps in case device shutdown
> +	   is hung during shoutdonw and reboot.
> +
> +
> +config DEVICE_SHUTDOWN_TIMEOUT_SEC
> +	int "device shutdown timeout in seconds"
> +	default 5
> +	depends on DEVICE_SHUTDOWN_TIMEOUT
> +	help
> +	  sets time for device shutdown timeout in seconds

You need much more help text for all of these.

And why are these in the drivers/base/Kconfig file?  It has nothing to
do with "devices", or the driver core, it's all core kernel reboot
logic.


> diff --git a/kernel/reboot.c b/kernel/reboot.c
> index 22c16e2564cc..8460bd24563b 100644
> --- a/kernel/reboot.c
> +++ b/kernel/reboot.c
> @@ -18,7 +18,7 @@
>  #include <linux/syscalls.h>
>  #include <linux/syscore_ops.h>
>  #include <linux/uaccess.h>
> -
> +#include <linux/sched/debug.h>

Why remove the blank line?

>  /*
>   * this indicates whether you can reboot with ctrl-alt-del: the default is yes
>   */
> @@ -48,6 +48,14 @@ int reboot_cpu;
>  enum reboot_type reboot_type = BOOT_ACPI;
>  int reboot_force;
>  
> +#ifdef CONFIG_DEVICE_SHUTDOWN_TIMEOUT
> +struct device_shutdown_timeout {
> +	struct timer_list timer;
> +	struct task_struct *task;
> +} devs_shutdown;
> +#define SHUTDOWN_TIMEOUT CONFIG_DEVICE_SHUTDOWN_TIMEOUT_SEC
> +#endif

#ifdefs should not be in .c files, please put this in a .h file where it
belongs.  Same for the other #ifdefs.



> +
>  struct sys_off_handler {
>  	struct notifier_block nb;
>  	int (*sys_off_cb)(struct sys_off_data *data);
> @@ -88,12 +96,46 @@ void emergency_restart(void)
>  }
>  EXPORT_SYMBOL_GPL(emergency_restart);
>  
> +#ifdef CONFIG_DEVICE_SHUTDOWN_TIMEOUT
> +static void device_shutdown_timeout_handler(struct timer_list *t)
> +{
> +	pr_emerg("**** device shutdown timeout ****\n");

What does this have to do with "devices"?  This is a whole-system issue,
or really a "broken driver" issue.

> +	show_stack(devs_shutdown.task, NULL, KERN_EMERG);

How do you know this is the 'device shutdown' stack?  What is a "device
shutdown"?

> +	if (system_state == SYSTEM_RESTART)
> +		emergency_restart();
> +	else
> +		machine_power_off();
> +}
> +
> +static void device_shutdown_timer_set(void)
> +{
> +	devs_shutdown.task = current;

It's just the normal shutdown stack/process, why call it a device?

> +	timer_setup(&devs_shutdown.timer, device_shutdown_timeout_handler, 0);
> +	devs_shutdown.timer.expires = jiffies + SHUTDOWN_TIMEOUT * HZ;
> +	add_timer(&devs_shutdown.timer);
> +}
> +
> +static void device_shutdown_timer_clr(void)
> +{
> +	del_timer(&devs_shutdown.timer);
> +}
> +#else
> +static inline void device_shutdown_timer_set(void)
> +{
> +}
> +static inline void device_shutdown_timer_clr(void)
> +{
> +}
> +#endif
> +
>  void kernel_restart_prepare(char *cmd)
>  {
>  	blocking_notifier_call_chain(&reboot_notifier_list, SYS_RESTART, cmd);
>  	system_state = SYSTEM_RESTART;
>  	usermodehelper_disable();
> +	device_shutdown_timer_set();
>  	device_shutdown();
> +	device_shutdown_timer_clr();

Why isn't all of this done in device_shutdown() if you think it is a
device issue?  Why put it in reboot.c?

thanks,

greg k-h

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ