[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <q7fi2hpswm2tsowrtbanlnidxnyq3fyb2xxr6gcowxv6sglhop@nsvwlol4dac3>
Date: Thu, 3 Jul 2025 15:36:00 +0200
From: Thierry Reding <thierry.reding@...il.com>
To: Jon Hunter <jonathanh@...dia.com>
Cc: Kartik Rajput <kkartik@...dia.com>, daniel.lezcano@...aro.org,
tglx@...utronix.de, linux-kernel@...r.kernel.org, linux-tegra@...r.kernel.org
Subject: Re: [PATCH] clocksource: timer-tegra186: Enable WDT at probe
On Thu, Jul 03, 2025 at 11:26:28AM +0100, Jon Hunter wrote:
>
>
> On 03/07/2025 11:12, Thierry Reding wrote:
> > On Thu, Jul 03, 2025 at 08:55:04AM +0100, Jon Hunter wrote:
> > >
> > >
> > > On 03/07/2025 07:55, Thierry Reding wrote:
> > > > On Mon, Jun 30, 2025 at 04:31:35PM +0530, Kartik Rajput wrote:
> > > > > Currently, if the system crashes or hangs during kernel boot before
> > > > > userspace initializes and configures the watchdog timer, then the
> > > > > watchdog won’t be able to recover the system as it’s not running. This
> > > > > becomes crucial during an over-the-air update, where if the newly
> > > > > updated kernel crashes on boot, the watchdog is needed to reset the
> > > > > device and boot into an alternative system partition. If the watchdog
> > > > > is disabled in such scenarios, it can lead to the system getting
> > > > > bricked.
> > > > >
> > > > > Enable the WDT during driver probe to allow recovery from any crash/hang
> > > > > seen during early kernel boot. Also, disable interrupts once userspace
> > > > > starts pinging the watchdog.
> > > > >
> > > > > Signed-off-by: Kartik Rajput <kkartik@...dia.com>
> > > > > ---
> > > > > drivers/clocksource/timer-tegra186.c | 42 ++++++++++++++++++++++++++++
> > > > > 1 file changed, 42 insertions(+)
> > > >
> > > > This seems dangerous to me. It means that if the operating system
> > > > doesn't start some sort of watchdog service in userspace that pings the
> > > > watchdog, the system will reboot 120 seconds after the watchdog probe.
> > >
> > >
> > > I don't believe that will happen with this change. The kernel will continue
> > > to pet the watchdog until userspace takes over with this change. At least
> > > that is my understanding.
> >
> > Ah yes... I skipped over that IRQ handling bit. However, I think this
> > still violates the assumptions because the driver will keep petting the
> > watchdog no matter what, which means that we now have no way of forcing
> > a reset of the system when userspace hangs. As long as just a tiny part
> > of the kernel keeps running, the watchdog would keep getting petted and
> > prevent it from resetting the system.
> >
> > Using a second watchdog still seems like a more robust alternative. Or
> > maybe we can find a way to remove the kernel petting once userspace
> > starts the watchdog.
>
> Once userspace calls the "->ping" callback then, 'enable_irq' is set to
> false and when 'tegra186_wdt_enable()' is called this will disable the IRQ
> so that the kernel no longer pets the watchdog. So this should disable
> kernel petting once userspace is up and running.
I clearly can't read code today. Seems generally fine, then, but I'm
actually really enthused now about using a second watchdog for kernel
petting. Since we don't use any of the other two watchdogs, is there
any reason why we can't cleanly separate both use-cases? It would let
us avoid some of these special cases that are not intuitive to
understand.
Thierry
Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)
Powered by blists - more mailing lists