[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <2024061955-zigzagged-uncoiled-da96@gregkh>
Date: Wed, 19 Jun 2024 12:48:04 +0200
From: Greg KH <gregkh@...uxfoundation.org>
To: Khasnis Soumya <soumya.khasnis@...y.com>
Cc: Daniel Lezcano <daniel.lezcano@...aro.org>, rafael@...nel.org,
linux-kernel@...r.kernel.org, festevam@...x.de, lee@...nel.org,
benjamin.bara@...data.com, dmitry.osipenko@...labora.com,
ldmldm05@...il.com, srinavasa.nagaraju@...y.com,
Madhusudan.Bobbili@...y.com, shingo.takeuchi@...y.com,
keita.aihara@...y.com, masaya.takahashi@...y.com
Subject: Re: [PATCH v5] driver core: Add timeout for device shutdown
On Wed, Jun 19, 2024 at 10:00:00AM +0000, Khasnis Soumya wrote:
> On Thu, Jun 13, 2024 at 01:51:57PM +0200, Daniel Lezcano wrote:
> > On 13/06/2024 10:43, Greg KH wrote:
> > > On Thu, Jun 13, 2024 at 08:32:26AM +0000, Soumya Khasnis wrote:
> > >> The device shutdown callbacks invoked during shutdown/reboot
> > >> are prone to errors depending on the device state or mishandling
> > >> by one or more driver. In order to prevent a device hang in such
> > >> scenarios, we bail out after a timeout while dumping a meaningful
> > >> call trace of the shutdown callback to kernel logs, which blocks
> > >> the shutdown or reboot process.
> > >
> > > Again, this is not a "device shutdown" timeout, it is a "the whole
> > > system has not shutdown this fast" timeout.
> > >
> > > And in looking at my system, it doesn't shutdown in 10 seconds as it is
> > > madly flushing a ton of stuff out to the disks, and they are slow
> > > beasts. So your 10 second default would cause me data loss on my
> > > workstation, not good!
> >
> > Thanks for pointing this out. It is exactly what I was worried about ...
> Thank you for comments Daniel and Greg, let me explain.
>
> Typically reboot/shutdown sequence involves following steps in User land before kernel restart/shutdown sequence is entered.
>
> 1. Terminate all services (except shutdown critical tasks)
> 2. Sync File systems
> 3. Unmount File systems
> 4. Trigger kernel reboot(LINUX_REBOOT_CMD_RESTART/LINUX_REBOOT_CMD_POWER_OFF) system call
>
> A userspace watchdog can be setup for above as exists on Android system.
> This needs large timeout value because it involves syncing data to disks.
True.
> Below is the kernel restart sequence after control moves to kernel in step 4).
> The issue we intend to address here is that the device driver shutdown callbacks may hang
> due to unresponsive device or a broken driver.
>
> |-kernel_restart()
> |- kernel_restart_prepare()
> |- device_shutdown() // Iterates over the device hierarchy and invokes the shutdown callbacks (class/bus/driver->shutdown)
> |- syscore_shutdown()
> |- machine_restart()
>
> I still believe a 10 sec timeout as default is reasonable for the device_shutdown().
> Not all drivers necessarily implement a shutdown callback and the timeout can be configured for large systems as needed.
No, you can not break existing systems with this, sorry.
Just enable the watchdog before you do step 4 and then if reboot doesn't
happen in time, the watchdog will reboot the kernel for you.
Also, again, fix your broken drivers to not do this please. You
obviously have experience with this already, what's preventing that from
being fixed on your end? Same goes for an "unresponsive device", that
too can be fixed in your broken driver, and some might argue, needs to
be fixed no matter what.
Don't paper over broken out-of-tree kernel code with stuff like this,
fix it please.
thanks,
greg k-h
Powered by blists - more mailing lists