[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
<PH8PR18MB5381B857859C6413392DD007C5CB2@PH8PR18MB5381.namprd18.prod.outlook.com>
Date: Wed, 5 Mar 2025 12:15:28 +0000
From: George Cherian <gcherian@...vell.com>
To: Ahmad Fatoum <a.fatoum@...gutronix.de>,
"linux@...ck-us.net"
<linux@...ck-us.net>,
"wim@...ux-watchdog.org" <wim@...ux-watchdog.org>,
"jwerner@...omium.org" <jwerner@...omium.org>,
"evanbenn@...omium.org"
<evanbenn@...omium.org>,
"kabel@...nel.org" <kabel@...nel.org>,
"krzk@...nel.org" <krzk@...nel.org>,
"mazziesaccount@...il.com"
<mazziesaccount@...il.com>,
"thomas.richard@...tlin.com"
<thomas.richard@...tlin.com>,
"lma@...omium.org" <lma@...omium.org>,
"bleung@...omium.org" <bleung@...omium.org>,
"support.opensource@...semi.com"
<support.opensource@...semi.com>,
"shawnguo@...nel.org"
<shawnguo@...nel.org>,
"s.hauer@...gutronix.de" <s.hauer@...gutronix.de>,
"kernel@...gutronix.de" <kernel@...gutronix.de>,
"festevam@...il.com"
<festevam@...il.com>,
"andy@...nel.org" <andy@...nel.org>,
"paul@...pouillou.net" <paul@...pouillou.net>,
"alexander.usyskin@...el.com"
<alexander.usyskin@...el.com>,
"andreas.werner@....de"
<andreas.werner@....de>,
"daniel@...ngy.jp" <daniel@...ngy.jp>,
"romain.perier@...il.com" <romain.perier@...il.com>,
"avifishman70@...il.com"
<avifishman70@...il.com>,
"tmaimon77@...il.com" <tmaimon77@...il.com>,
"tali.perry1@...il.com" <tali.perry1@...il.com>,
"venture@...gle.com"
<venture@...gle.com>,
"yuenn@...gle.com" <yuenn@...gle.com>,
"benjaminfair@...gle.com" <benjaminfair@...gle.com>,
"maddy@...ux.ibm.com"
<maddy@...ux.ibm.com>,
"mpe@...erman.id.au" <mpe@...erman.id.au>,
"npiggin@...il.com" <npiggin@...il.com>,
"christophe.leroy@...roup.eu"
<christophe.leroy@...roup.eu>,
"naveen@...nel.org" <naveen@...nel.org>,
"mwalle@...nel.org" <mwalle@...nel.org>,
"xingyu.wu@...rfivetech.com"
<xingyu.wu@...rfivetech.com>,
"ziv.xu@...rfivetech.com"
<ziv.xu@...rfivetech.com>,
"hayashi.kunihiko@...ionext.com"
<hayashi.kunihiko@...ionext.com>,
"mhiramat@...nel.org" <mhiramat@...nel.org>
CC: "chrome-platform@...ts.linux.dev" <chrome-platform@...ts.linux.dev>,
"linux-watchdog@...r.kernel.org" <linux-watchdog@...r.kernel.org>,
"imx@...ts.linux.dev" <imx@...ts.linux.dev>,
"patches@...nsource.cirrus.com"
<patches@...nsource.cirrus.com>,
"linux-mips@...r.kernel.org"
<linux-mips@...r.kernel.org>,
"linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>,
"linuxppc-dev@...ts.ozlabs.org"
<linuxppc-dev@...ts.ozlabs.org>,
"linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>
Subject: Re: [PATCH v4 0/2] Add stop_on_panic support for watchdog
Hi Ahmad,
>Hi George,
>
>On 05.03.25 12:28, George Cherian wrote:
> Hi Ahmad,
>>> Hi George,
>>> On 05.03.25 11:10, George Cherian wrote:
>>>> This series adds a new kernel command line option to watchdog core to
>>>> stop the watchdog on panic. This is useul in certain systems which prevents
>>>> successful loading of kdump kernel due to watchdog reset.
>>>>
>>>> Some of the watchdog drivers stop function could sleep. For such
>>>> drivers the stop_on_panic is not valid as the notifier callback happens
>>>> in atomic context. Introduce WDIOF_STOP_MAYSLEEP flag to watchdog_info
>>>> options to indicate whether the stop function would sleep.
>>>
>>> Did you consider having a reset_on_panic instead, which sets a user-specified
>>> timeout on panic? This would make the mechanism useful also for watchdogs
>>
>> /proc/sys/kernel/panic already provides that support. You may echo a non-zero value
>> and the system tries for a soft reboot after those many seconds. But this doesn't happen
>> in case of a kdump kernel load after panic.
>
>The timeout specified to the Watchdog reset_on_panic option would be programmed into
>the active watchdogs and not be used to trigger a software-induced reboot.
Yes.
>
>>> that can't be disabled and would protect against system lock up:
>>> Consider a memory-corruption bug (perhaps externally via DMA), which partially
>>> overwrites both main and kdump kernel. With a disabled watchdog, the system
>>> may not be able to recover on its own.
>>
>> Yes, that is the reason why the kernel command-line is optional and by default it is set to zero.
>> So that in cases if you have a corrupted kdump kernel then watchdog kicks in.
>
>The existing option isn't enough for the kdump kernel use case.
>If we (i.e. you) are going to do something about it, wouldn't it be
>better to have a solution that's applicable to a wider number of
>watchdog devices?
I need a slight clarification here.
1. reset_on_panic takes the number of seconds to be reloaded to watchdog HW, so that it initiates a
watchdog reset after the specified timeout, if kdump kernel fails to boot or hung while booting.
2. in case reset_on_panic = 0 then it behaves like stop_on_panic=1.
Is this what you meant?
I would let Guenter comment on this approach.
>>> If you did consider it, what made you decide against it?
>> watchdog.stop_on_panic=1 is specifically for systems which can't boot a kdump kernel due to the fact
>> that the kdump kernel gets a watchdog reset while booting, may be due to a shorter watchdog time.
>> For eg: a 32-bit watchdog down counter running at 1GHz.
>> reset_on_panic can guarantee only the largest watchdog timeout supported by HW,
>> since there is no one to ping the watchdog.
>If you are serious with the watchdog use, you'll want to use the watchdog to
>monitor kernel startup as well. If the bootloader can set a watchdog timeout
>just before starting the kernel and it doesn't expire before the kernel watchdog
>driver takes over, why can't we do the same just before starting the dumpkernel?
Yes, in an ideal world with ideal HW. But there are HW with issues which cannot have large
enough Watchdog time. Such HW would boot from FW to kernel without watchdog enabled.
And stop_on_panic does the similar for kdump kernel too.
-George
>
>Thanks,
>Ahmad
>
>
>>
>> Thanks,
>> Ahmad
>>
>>>
>>>
>> Changelog:
>> v1 -> v2
>> - Remove the per driver flag setting option
>> - Take the parameter via kernel command-line parameter to watchdog_core.
>>
>> v2 -> v3
>> - Remove the helper function watchdog_stop_on_panic() from watchdog.h.
>> - There are no users for this.
>>
>> v3 -> v4
>> - Since the panic notifier is in atomic context, watchdog functions
>> which sleep can't be called.
>> - Add an options flag WDIOF_STOP_MAYSLEEP to indicate whether stop
>> function sleeps.
>> - Simplify the stop_on_panic kernel command line parsing.
>> - Enable the panic notiffier only if the watchdog stop function doesn't
>> sleep
>>
>> George Cherian (2):
>> watchdog: Add a new flag WDIOF_STOP_MAYSLEEP
>> drivers: watchdog: Add support for panic notifier callback
>
> - George
--
Pengutronix e.K. | |
Steuerwalder Str. 21 | https://urldefense.proofpoint.com/v2/url?u=http-3A__www.pengutronix.de_&d=DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=npgTSgHrUSLmXpBZJKVhk0lE_XNvtVDl8ZA2zBvBqPw&m=Df3J3ZRga7XxcgUdJOqYVMJ-ALX5jC3eiII4YhsAdC5pYhr1xwcqbzhIy6MCEqws&s=ybglw-WK4VGE8gHGNwMrC1_VliOv72pjDLEIm9FF_dE&e= |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
Powered by blists - more mailing lists