linux-kernel - Re: [PATCH v9 1/3] watchdog: introduce watchdog.open

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190122172935.GA4964@roeck-us.net>
Date:   Tue, 22 Jan 2019 09:29:35 -0800
From:   Guenter Roeck <linux@...ck-us.net>
To:     Rasmus Villemoes <rasmus.villemoes@...vas.dk>
Cc:     "linux-watchdog@...r.kernel.org" <linux-watchdog@...r.kernel.org>,
        Wim Van Sebroeck <wim@...ux-watchdog.org>,
        Jonathan Corbet <corbet@....net>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
        Esben Haabendal <esben@...bendal.dk>,
        "martin@...deboll.net" <martin@...deboll.net>,
        Rasmus Villemoes <Rasmus.Villemoes@...vas.se>
Subject: Re: [PATCH v9 1/3] watchdog: introduce watchdog.open_timeout
 commandline parameter

On Mon, Jan 21, 2019 at 08:45:39PM +0000, Rasmus Villemoes wrote:
> The watchdog framework takes care of feeding a hardware watchdog until
> userspace opens /dev/watchdogN. If that never happens for some reason
> (buggy init script, corrupt root filesystem or whatnot) but the kernel
> itself is fine, the machine stays up indefinitely. This patch allows
> setting an upper limit for how long the kernel will take care of the
> watchdog, thus ensuring that the watchdog will eventually reset the
> machine.
> 
> A value of 0 (the default) means infinite timeout, preserving the
> current behaviour.
> 
> This is particularly useful for embedded devices where some fallback
> logic is implemented in the bootloader (e.g., use a different root
> partition, boot from network, ...).
> 
> There is already handle_boot_enabled serving a similar purpose. However,
> such a binary choice is unsuitable if the hardware watchdog cannot be
> programmed by the bootloader to provide a timeout long enough for
> userspace to get up and running. Many of the embedded devices we see use
> external (gpio-triggered) watchdogs with a fixed timeout of the order of
> 1-2 seconds.
> 
> The open timeout is also used as a maximum time for an application to
> re-open /dev/watchdogN after closing it. Again, while the kernel already
> has a nowayout mechanism, using that means userspace is at the mercy of
> whatever timeout the hardware has.
> 
> Being a module parameter, one can revert to the ordinary behaviour of
> having the kernel maintain the watchdog indefinitely by simply writing 0
> to /sys/... after initially opening /dev/watchdog; conversely, one can
> of course also have the current behaviour of allowing indefinite time
> until the first open, and then set that module parameter.
> 
> Signed-off-by: Rasmus Villemoes <rasmus.villemoes@...vas.dk>
> ---
>  .../watchdog/watchdog-parameters.txt          |  8 +++++
>  drivers/watchdog/watchdog_dev.c               | 30 +++++++++++++++++--
>  2 files changed, 36 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/watchdog/watchdog-parameters.txt b/Documentation/watchdog/watchdog-parameters.txt
> index 0b88e333f9e1..907c4bb13810 100644
> --- a/Documentation/watchdog/watchdog-parameters.txt
> +++ b/Documentation/watchdog/watchdog-parameters.txt
> @@ -8,6 +8,14 @@ See Documentation/admin-guide/kernel-parameters.rst for information on
>  providing kernel parameters for builtin drivers versus loadable
>  modules.
>  
> +The watchdog core parameter watchdog.open_timeout is the maximum time,
> +in seconds, for which the watchdog framework will take care of pinging
> +a hardware watchdog until userspace opens the corresponding
> +/dev/watchdogN device. A value of 0 (the default) means an infinite
> +timeout. Setting this to a non-zero value can be useful to ensure that
> +either userspace comes up properly, or the board gets reset and allows
> +fallback logic in the bootloader to try something else.
> +

This is misleading. Unless I am missing something, the above only applies
if the watchdog is already runnning at boot, and after it has been opened
and closed once.

FWIW, I find this operation quite confusing. What is the rationale for not
starting the watchdog at boot time if it is not running and open_timeout
is set, but then refusing to stop it after it has been started once ?

>  
>  -------------------------------------------------
>  acquirewdt:
> diff --git a/drivers/watchdog/watchdog_dev.c b/drivers/watchdog/watchdog_dev.c
> index f6c24b22b37c..ab2ad20f13eb 100644
> --- a/drivers/watchdog/watchdog_dev.c
> +++ b/drivers/watchdog/watchdog_dev.c
> @@ -69,6 +69,7 @@ struct watchdog_core_data {
>  	struct mutex lock;
>  	ktime_t last_keepalive;
>  	ktime_t last_hw_keepalive;
> +	ktime_t open_deadline;
>  	struct hrtimer timer;
>  	struct kthread_work work;
>  	unsigned long status;		/* Internal status bits */
> @@ -87,6 +88,19 @@ static struct kthread_worker *watchdog_kworker;
>  static bool handle_boot_enabled =
>  	IS_ENABLED(CONFIG_WATCHDOG_HANDLE_BOOT_ENABLED);
>  
> +static unsigned open_timeout;
> +
> +static bool watchdog_past_open_deadline(struct watchdog_core_data *data)
> +{
> +	return ktime_after(ktime_get(), data->open_deadline);
> +}
> +
> +static void watchdog_set_open_deadline(struct watchdog_core_data *data)
> +{
> +	data->open_deadline = open_timeout ?
> +		ktime_get() + ktime_set(open_timeout, 0) : KTIME_MAX;
> +}
> +
>  static inline bool watchdog_need_worker(struct watchdog_device *wdd)
>  {
>  	/* All variables in milli-seconds */
> @@ -211,7 +225,13 @@ static bool watchdog_worker_should_ping(struct watchdog_core_data *wd_data)
>  {
>  	struct watchdog_device *wdd = wd_data->wdd;
>  
> -	return wdd && (watchdog_active(wdd) || watchdog_hw_running(wdd));
> +	if (!wdd)
> +		return false;
> +
> +	if (watchdog_active(wdd))
> +		return true;
> +
> +	return watchdog_hw_running(wdd) && !watchdog_past_open_deadline(wd_data);
>  }
>  
>  static void watchdog_ping_work(struct kthread_work *work)
> @@ -297,7 +317,7 @@ static int watchdog_stop(struct watchdog_device *wdd)
>  		return -EBUSY;
>  	}
>  
> -	if (wdd->ops->stop) {
> +	if (wdd->ops->stop && !open_timeout) {

This changes the semantics of WDIOC_SETOPTIONS / WDIOS_DISABLECARD.
"Turn off the watchdog timer" is well defined and doesn't leave
the option of setting a timeout on it.

>  		clear_bit(WDOG_HW_RUNNING, &wdd->status);
>  		err = wdd->ops->stop(wdd);
>  	} else {
> @@ -883,6 +903,7 @@ static int watchdog_release(struct inode *inode, struct file *file)
>  		watchdog_ping(wdd);
>  	}
>  
> +	watchdog_set_open_deadline(wd_data);
>  	watchdog_update_worker(wdd);
>  
>  	/* make sure that /dev/watchdog can be re-opened */
> @@ -983,6 +1004,7 @@ static int watchdog_cdev_register(struct watchdog_device *wdd, dev_t devno)
>  
>  	/* Record time of most recent heartbeat as 'just before now'. */
>  	wd_data->last_hw_keepalive = ktime_sub(ktime_get(), 1);
> +	watchdog_set_open_deadline(wd_data);
>  
>  	/*
>  	 * If the watchdog is running, prevent its driver from being unloaded,
> @@ -1181,3 +1203,7 @@ module_param(handle_boot_enabled, bool, 0444);
>  MODULE_PARM_DESC(handle_boot_enabled,
>  	"Watchdog core auto-updates boot enabled watchdogs before userspace takes over (default="
>  	__MODULE_STRING(IS_ENABLED(CONFIG_WATCHDOG_HANDLE_BOOT_ENABLED)) ")");
> +
> +module_param(open_timeout, uint, 0644);
> +MODULE_PARM_DESC(open_timeout,
> +	"Maximum time (in seconds, 0 means infinity) for userspace to take over a running watchdog (default=0)");

The description is misleading. After the initial open, a subsequent close no
longer really stops the watchdog.

> -- 
> 2.20.1
>