lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240607113750.GA30558@sony.com>
Date: Fri, 7 Jun 2024 11:37:50 +0000
From: Khasnis Soumya <soumya.khasnis@...y.com>
To: Daniel Lezcano <daniel.lezcano@...aro.org>
Cc: gregkh@...uxfoundation.org, rafael@...nel.org,
	linux-kernel@...r.kernel.org, festevam@...x.de, lee@...nel.org,
	benjamin.bara@...data.com, dmitry.osipenko@...labora.com,
	ldmldm05@...il.com, soumya.khasnis@...y.com,
	srinavasa.nagaraju@...y.com, Madhusudan.Bobbili@...y.com,
	shingo.takeuchi@...y.com, keita.aihara@...y.com,
	masaya.takahashi@...y.com
Subject: Re: [PATCH v3] driver core: Add timeout for device shutdown

On Thu, Jun 06, 2024 at 05:23:19PM +0200, Daniel Lezcano wrote:
> On 06/06/2024 10:50, Soumya Khasnis wrote:
> > The device shutdown callbacks invoked during shutdown/reboot
> > are prone to errors depending on the device state or mishandling
> > by one or more driver. In order to prevent a device hang in such
> > scenarios, we bail out after a timeout while dumping a meaningful
> > call trace of the shutdown callback to kernel logs, which blocks
> > the shutdown or reboot process.
> 
> Is that not somehow already achieved by the watchdog mechanism ?
The hard or software watchdog enabled by config_lockup_detector couldn’t
detect the cases when stalled on IO wait (wait_for_completion/io)

> 
> > Signed-off-by: Soumya Khasnis <soumya.khasnis@...y.com>
> > Signed-off-by: Srinavasa Nagaraju <Srinavasa.Nagaraju@...y.com>
> > ---
> > Changes in v3:
> >    -fix review comments
> >    -updated commit message
> > 
> >   drivers/base/Kconfig | 18 ++++++++++++++++++
> >   drivers/base/base.h  |  8 ++++++++
> >   drivers/base/core.c  | 40 ++++++++++++++++++++++++++++++++++++++++
> >   3 files changed, 66 insertions(+)
> > 
> > diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
> > index 2b8fd6bb7da0..342d3f87a404 100644
> > --- a/drivers/base/Kconfig
> > +++ b/drivers/base/Kconfig
> > @@ -243,3 +243,21 @@ config FW_DEVLINK_SYNC_STATE_TIMEOUT
> >   	  work on.
> >   
> >   endmenu
> > +
> > +config DEVICE_SHUTDOWN_TIMEOUT
> > +	bool "device shutdown timeout"
> > +	default y
> > +	help
> > +	   Enable timeout for device shutdown. In case of device shutdown is
> > +	   broken or device is not responding, system shutdown or restart may hang.
> > +	   This timeout handles such situation and triggers emergency_restart or
> > +	   machine_power_off. Also dumps call trace of shutdown process.
> > +
> > +
> > +config DEVICE_SHUTDOWN_TIMEOUT_SEC
> > +	int "device shutdown timeout in seconds"
> > +	range 10 60
> > +	default 10
> 
> How do you know the shutdown time is between this range?
> 
> What about large systems ?
Agree it is difficult to set single timeout for all device.
This range I have based on consumer device where response time cannot be more.
But still as you mentioned we can not make this configuration by default "true/y"
with some fixed range. I will change patch  to set this configuration default to 
"false/n" as before, and will also remove range.

> 
> > +	depends on DEVICE_SHUTDOWN_TIMEOUT
> > +	help
> > +	  sets time for device shutdown timeout in seconds
> > diff --git a/drivers/base/base.h b/drivers/base/base.h
> > index 0738ccad08b2..97eea57a8868 100644
> > --- a/drivers/base/base.h
> > +++ b/drivers/base/base.h
> > @@ -243,3 +243,11 @@ static inline int devtmpfs_delete_node(struct device *dev) { return 0; }
> >   
> >   void software_node_notify(struct device *dev);
> >   void software_node_notify_remove(struct device *dev);
> > +
> > +#ifdef CONFIG_DEVICE_SHUTDOWN_TIMEOUT
> > +struct device_shutdown_timeout {
> > +	struct timer_list timer;
> > +	struct task_struct *task;
> > +};
> > +#define SHUTDOWN_TIMEOUT CONFIG_DEVICE_SHUTDOWN_TIMEOUT_SEC
> > +#endif
> > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > index b93f3c5716ae..dab455054a80 100644
> > --- a/drivers/base/core.c
> > +++ b/drivers/base/core.c
> > @@ -35,6 +35,12 @@
> >   #include "base.h"
> >   #include "physical_location.h"
> >   #include "power/power.h"
> > +#include <linux/sched/debug.h>
> > +#include <linux/reboot.h>
> > +
> > +#ifdef CONFIG_DEVICE_SHUTDOWN_TIMEOUT
> > +struct device_shutdown_timeout devs_shutdown;
> > +#endif
> >   
> >   /* Device links support. */
> >   static LIST_HEAD(deferred_sync);
> > @@ -4799,6 +4805,38 @@ int device_change_owner(struct device *dev, kuid_t kuid, kgid_t kgid)
> >   }
> >   EXPORT_SYMBOL_GPL(device_change_owner);
> >   
> > +#ifdef CONFIG_DEVICE_SHUTDOWN_TIMEOUT
> > +static void device_shutdown_timeout_handler(struct timer_list *t)
> > +{
> > +	pr_emerg("**** device shutdown timeout ****\n");
> > +	show_stack(devs_shutdown.task, NULL, KERN_EMERG);
> > +	if (system_state == SYSTEM_RESTART)
> > +		emergency_restart();
> > +	else
> > +		machine_power_off();
> > +}
> 
> So if one device is misbehaving, all the others shutdown callbacks are 
> skipped with emergency halt/reboot ? That is prone to break the system, no?
Skipping other callback may not cause system break, but emergency shutdown or
reboot is better then leave system in hung state. That is the main functionality
of this patch.
> 
> > +static void device_shutdown_timer_set(void)
> > +{
> > +	devs_shutdown.task = current;
> > +	timer_setup(&devs_shutdown.timer, device_shutdown_timeout_handler, 0);
> > +	devs_shutdown.timer.expires = jiffies + SHUTDOWN_TIMEOUT * HZ;
> > +	add_timer(&devs_shutdown.timer);
> > +}
> > +
> > +static void device_shutdown_timer_clr(void)
> > +{
> > +	del_timer(&devs_shutdown.timer);
> > +}
> > +#else
> > +static inline void device_shutdown_timer_set(void)
> > +{
> > +}
> > +static inline void device_shutdown_timer_clr(void)
> > +{
> > +}
> > +#endif
> > +
> >   /**
> >    * device_shutdown - call ->shutdown() on each device to shutdown.
> >    */
> > @@ -4810,6 +4848,7 @@ void device_shutdown(void)
> >   	device_block_probing();
> >   
> >   	cpufreq_suspend();
> > +	device_shutdown_timer_set();
> >   
> >   	spin_lock(&devices_kset->list_lock);
> >   	/*
> > @@ -4869,6 +4908,7 @@ void device_shutdown(void)
> >   		spin_lock(&devices_kset->list_lock);
> >   	}
> >   	spin_unlock(&devices_kset->list_lock);
> > +	device_shutdown_timer_clr();
> >   }
> >   
> >   /*
> 
> -- 
> <https://urldefense.com/v3/__http://www.linaro.org/__;!!JmoZiZGBv3RvKRSx!6XWB4gl8L3rRMPtMmiqJdKcGhAMKhZ9UVvLyqOiGr3vHiQzlgwInwY3OVNNzXZsLONbeCLZZ-CY-APJdHGYO7DpNrCqk$ [linaro[.]org]> Linaro.org │ Open source software for ARM SoCs
> 
> Follow Linaro:  <https://urldefense.com/v3/__http://www.facebook.com/pages/Linaro__;!!JmoZiZGBv3RvKRSx!6XWB4gl8L3rRMPtMmiqJdKcGhAMKhZ9UVvLyqOiGr3vHiQzlgwInwY3OVNNzXZsLONbeCLZZ-CY-APJdHGYO7AtMvPiK$ [facebook[.]com]> Facebook |
> <https://urldefense.com/v3/__http://twitter.com/*!/linaroorg__;Iw!!JmoZiZGBv3RvKRSx!6XWB4gl8L3rRMPtMmiqJdKcGhAMKhZ9UVvLyqOiGr3vHiQzlgwInwY3OVNNzXZsLONbeCLZZ-CY-APJdHGYO7Imo3W2M$ [twitter[.]com]> Twitter |
> <https://urldefense.com/v3/__http://www.linaro.org/linaro-blog/__;!!JmoZiZGBv3RvKRSx!6XWB4gl8L3rRMPtMmiqJdKcGhAMKhZ9UVvLyqOiGr3vHiQzlgwInwY3OVNNzXZsLONbeCLZZ-CY-APJdHGYO7DxWnKe3$ [linaro[.]org]> Blog
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ