linux-kernel - Re: [PATCH v9 1/3] watchdog: introduce watchdog.open

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <de4bd94c-d2f7-66c2-6386-8b1cec57af03@prevas.dk>
Date:   Tue, 29 Jan 2019 20:35:13 +0000
From:   Rasmus Villemoes <rasmus.villemoes@...vas.dk>
To:     Guenter Roeck <linux@...ck-us.net>
CC:     "linux-watchdog@...r.kernel.org" <linux-watchdog@...r.kernel.org>,
        Wim Van Sebroeck <wim@...ux-watchdog.org>,
        Jonathan Corbet <corbet@....net>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
        Esben Haabendal <esben@...bendal.dk>,
        "martin@...deboll.net" <martin@...deboll.net>,
        Rasmus Villemoes <Rasmus.Villemoes@...vas.se>
Subject: Re: [PATCH v9 1/3] watchdog: introduce watchdog.open_timeout
 commandline parameter

On 22/01/2019 18.29, Guenter Roeck wrote:
> On Mon, Jan 21, 2019 at 08:45:39PM +0000, Rasmus Villemoes wrote:
>> The watchdog framework takes care of feeding a hardware watchdog until
>> userspace opens /dev/watchdogN. If that never happens for some reason
>> (buggy init script, corrupt root filesystem or whatnot) but the kernel
>> itself is fine, the machine stays up indefinitely. This patch allows
>> setting an upper limit for how long the kernel will take care of the
>> watchdog, thus ensuring that the watchdog will eventually reset the
>> machine.
>>
>> A value of 0 (the default) means infinite timeout, preserving the
>> current behaviour.
>>
>> This is particularly useful for embedded devices where some fallback
>> logic is implemented in the bootloader (e.g., use a different root
>> partition, boot from network, ...).
>>
>> There is already handle_boot_enabled serving a similar purpose. However,
>> such a binary choice is unsuitable if the hardware watchdog cannot be
>> programmed by the bootloader to provide a timeout long enough for
>> userspace to get up and running. Many of the embedded devices we see use
>> external (gpio-triggered) watchdogs with a fixed timeout of the order of
>> 1-2 seconds.
>>
>> The open timeout is also used as a maximum time for an application to
>> re-open /dev/watchdogN after closing it. Again, while the kernel already
>> has a nowayout mechanism, using that means userspace is at the mercy of
>> whatever timeout the hardware has.
>>
>> Being a module parameter, one can revert to the ordinary behaviour of
>> having the kernel maintain the watchdog indefinitely by simply writing 0
>> to /sys/... after initially opening /dev/watchdog; conversely, one can
>> of course also have the current behaviour of allowing indefinite time
>> until the first open, and then set that module parameter.
>>
>> Signed-off-by: Rasmus Villemoes <rasmus.villemoes@...vas.dk>
>> ---
>>  .../watchdog/watchdog-parameters.txt          |  8 +++++
>>  drivers/watchdog/watchdog_dev.c               | 30 +++++++++++++++++--
>>  2 files changed, 36 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/watchdog/watchdog-parameters.txt b/Documentation/watchdog/watchdog-parameters.txt
>> index 0b88e333f9e1..907c4bb13810 100644
>> --- a/Documentation/watchdog/watchdog-parameters.txt
>> +++ b/Documentation/watchdog/watchdog-parameters.txt
>> @@ -8,6 +8,14 @@ See Documentation/admin-guide/kernel-parameters.rst for information on
>>  providing kernel parameters for builtin drivers versus loadable
>>  modules.
>>  
>> +The watchdog core parameter watchdog.open_timeout is the maximum time,
>> +in seconds, for which the watchdog framework will take care of pinging
>> +a hardware watchdog until userspace opens the corresponding
>> +/dev/watchdogN device. A value of 0 (the default) means an infinite
>> +timeout. Setting this to a non-zero value can be useful to ensure that
>> +either userspace comes up properly, or the board gets reset and allows
>> +fallback logic in the bootloader to try something else.
>> +
> 
> This is misleading. Unless I am missing something, the above only applies
> if the watchdog is already runnning at boot,

well, yes, if it's not running at boot, there's nothing for the kernel
to take care of. I can do s/a hardware watchdog/a running hardware
watchdog/ if that makes it clearer.

 and after it has been opened
> and closed once.
> 
> FWIW, I find this operation quite confusing. What is the rationale for not
> starting the watchdog at boot time if it is not running and open_timeout
> is set, but then refusing to stop it after it has been started once ?

I guess you have a point that there's a certain asymmetry there. In
practice, the cases where one wants this kind of robustness guarantee do
not rely on a watchdog that can/must be started and stopped by software.

>>  
>>  static void watchdog_ping_work(struct kthread_work *work)
>> @@ -297,7 +317,7 @@ static int watchdog_stop(struct watchdog_device *wdd)
>>  		return -EBUSY;
>>  	}
>>  
>> -	if (wdd->ops->stop) {
>> +	if (wdd->ops->stop && !open_timeout) {
> 
> This changes the semantics of WDIOC_SETOPTIONS / WDIOS_DISABLECARD.
> "Turn off the watchdog timer" is well defined and doesn't leave
> the option of setting a timeout on it.

I can drop this hunk, since it's mostly irrelevant to the actual use
cases I have in mind. It makes testing the feature on reference boards a
little more awkward, but I can live with that.

>> +
>> +module_param(open_timeout, uint, 0644);
>> +MODULE_PARM_DESC(open_timeout,
>> +	"Maximum time (in seconds, 0 means infinity) for userspace to take over a running watchdog (default=0)");
> 
> The description is misleading. After the initial open, a subsequent close no
> longer really stops the watchdog.

If I drop the "&& !open_timeout" above, this would be accurate, no?

Rasmus