[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <k2t3dkh3acoenhxtsd3ekvpnwl5yir6qaun52h5prdfwcx5lsb@h3ieoj7jfu6t>
Date: Thu, 3 Jul 2025 12:12:58 +0200
From: Thierry Reding <thierry.reding@...il.com>
To: Jon Hunter <jonathanh@...dia.com>
Cc: Kartik Rajput <kkartik@...dia.com>, daniel.lezcano@...aro.org,
tglx@...utronix.de, linux-kernel@...r.kernel.org, linux-tegra@...r.kernel.org
Subject: Re: [PATCH] clocksource: timer-tegra186: Enable WDT at probe
On Thu, Jul 03, 2025 at 08:55:04AM +0100, Jon Hunter wrote:
>
>
> On 03/07/2025 07:55, Thierry Reding wrote:
> > On Mon, Jun 30, 2025 at 04:31:35PM +0530, Kartik Rajput wrote:
> > > Currently, if the system crashes or hangs during kernel boot before
> > > userspace initializes and configures the watchdog timer, then the
> > > watchdog won’t be able to recover the system as it’s not running. This
> > > becomes crucial during an over-the-air update, where if the newly
> > > updated kernel crashes on boot, the watchdog is needed to reset the
> > > device and boot into an alternative system partition. If the watchdog
> > > is disabled in such scenarios, it can lead to the system getting
> > > bricked.
> > >
> > > Enable the WDT during driver probe to allow recovery from any crash/hang
> > > seen during early kernel boot. Also, disable interrupts once userspace
> > > starts pinging the watchdog.
> > >
> > > Signed-off-by: Kartik Rajput <kkartik@...dia.com>
> > > ---
> > > drivers/clocksource/timer-tegra186.c | 42 ++++++++++++++++++++++++++++
> > > 1 file changed, 42 insertions(+)
> >
> > This seems dangerous to me. It means that if the operating system
> > doesn't start some sort of watchdog service in userspace that pings the
> > watchdog, the system will reboot 120 seconds after the watchdog probe.
>
>
> I don't believe that will happen with this change. The kernel will continue
> to pet the watchdog until userspace takes over with this change. At least
> that is my understanding.
Ah yes... I skipped over that IRQ handling bit. However, I think this
still violates the assumptions because the driver will keep petting the
watchdog no matter what, which means that we now have no way of forcing
a reset of the system when userspace hangs. As long as just a tiny part
of the kernel keeps running, the watchdog would keep getting petted and
prevent it from resetting the system.
Using a second watchdog still seems like a more robust alternative. Or
maybe we can find a way to remove the kernel petting once userspace
starts the watchdog.
Thierry
Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)
Powered by blists - more mailing lists