linux-kernel - Re: [PATCHv2 3/4] kernel/watchdog: adapt the watchdog

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YVv4tT3WXrKvPe0g@alley>
Date:   Tue, 5 Oct 2021 09:03:17 +0200
From:   Petr Mladek <pmladek@...e.com>
To:     Pingfan Liu <kernelfans@...il.com>
Cc:     linux-kernel@...r.kernel.org, Sumit Garg <sumit.garg@...aro.org>,
        Catalin Marinas <catalin.marinas@....com>,
        Will Deacon <will@...nel.org>, Ingo Molnar <mingo@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Mark Rutland <mark.rutland@....com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Jiri Olsa <jolsa@...hat.com>,
        Namhyung Kim <namhyung@...nel.org>,
        Marc Zyngier <maz@...nel.org>,
        Kees Cook <keescook@...omium.org>,
        Masahiro Yamada <masahiroy@...nel.org>,
        Sami Tolvanen <samitolvanen@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Wang Qing <wangqing@...o.com>,
        "Peter Zijlstra (Intel)" <peterz@...radead.org>,
        Santosh Sivaraj <santosh@...six.org>,
        linux-arm-kernel@...ts.infradead.org
Subject: Re: [PATCHv2 3/4] kernel/watchdog: adapt the watchdog_hld interface
 for async model

On Thu 2021-09-23 22:09:50, Pingfan Liu wrote:
> When lockup_detector_init()->watchdog_nmi_probe(), PMU may be not ready
> yet. E.g. on arm64, PMU is not ready until
> device_initcall(armv8_pmu_driver_init).  And it is deeply integrated
> with the driver model and cpuhp. Hence it is hard to push this
> initialization before smp_init().
> 
> But it is easy to take an opposite approach by enabling watchdog_hld to
> get the capability of PMU async.
> 
> The async model is achieved by expanding watchdog_nmi_probe() with
> -EBUSY, and a re-initializing work_struct which waits on a wait_queue_head.
> 
> Signed-off-by: Pingfan Liu <kernelfans@...il.com>
> Cc: Sumit Garg <sumit.garg@...aro.org>
> Cc: Catalin Marinas <catalin.marinas@....com>
> Cc: Will Deacon <will@...nel.org>
> Cc: Ingo Molnar <mingo@...hat.com>
> Cc: Arnaldo Carvalho de Melo <acme@...nel.org>
> Cc: Mark Rutland <mark.rutland@....com>
> Cc: Alexander Shishkin <alexander.shishkin@...ux.intel.com>
> Cc: Jiri Olsa <jolsa@...hat.com>
> Cc: Namhyung Kim <namhyung@...nel.org>
> Cc: Marc Zyngier <maz@...nel.org>
> Cc: Kees Cook <keescook@...omium.org>
> Cc: Masahiro Yamada <masahiroy@...nel.org>
> Cc: Sami Tolvanen <samitolvanen@...gle.com>
> Cc: Petr Mladek <pmladek@...e.com>
> Cc: Andrew Morton <akpm@...ux-foundation.org>
> Cc: Wang Qing <wangqing@...o.com>
> Cc: "Peter Zijlstra (Intel)" <peterz@...radead.org>
> Cc: Santosh Sivaraj <santosh@...six.org>
> Cc: linux-arm-kernel@...ts.infradead.org
> To: linux-kernel@...r.kernel.org
> ---
>  include/linux/nmi.h |  3 +++
>  kernel/watchdog.c   | 37 +++++++++++++++++++++++++++++++++++--
>  2 files changed, 38 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/nmi.h b/include/linux/nmi.h
> index b7bcd63c36b4..270d440fe4b7 100644
> --- a/include/linux/nmi.h
> +++ b/include/linux/nmi.h
> @@ -118,6 +118,9 @@ static inline int hardlockup_detector_perf_init(void) { return 0; }
>  
>  void watchdog_nmi_stop(void);
>  void watchdog_nmi_start(void);
> +
> +extern bool hld_detector_delay_initialized;
> +extern struct wait_queue_head hld_detector_wait;
>  int watchdog_nmi_probe(void);
>  void watchdog_nmi_enable(unsigned int cpu);
>  void watchdog_nmi_disable(unsigned int cpu);
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 6e6dd5f0bc3e..bd4ae1839b72 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -103,7 +103,10 @@ void __weak watchdog_nmi_disable(unsigned int cpu)
>  	hardlockup_detector_perf_disable();
>  }
>  
> -/* Return 0, if a NMI watchdog is available. Error code otherwise */
> +/*
> + * Return 0, if a NMI watchdog is available. -EBUSY if not ready.
> + * Other negative value if not support.
> + */
>  int __weak __init watchdog_nmi_probe(void)
>  {
>  	return hardlockup_detector_perf_init();
> @@ -739,15 +742,45 @@ int proc_watchdog_cpumask(struct ctl_table *table, int write,
>  }
>  #endif /* CONFIG_SYSCTL */
>  
> +static void lockup_detector_delay_init(struct work_struct *work);
> +bool hld_detector_delay_initialized __initdata;
> +
> +struct wait_queue_head hld_detector_wait __initdata =
> +		__WAIT_QUEUE_HEAD_INITIALIZER(hld_detector_wait);
> +
> +static struct work_struct detector_work __initdata =
> +		__WORK_INITIALIZER(detector_work, lockup_detector_delay_init);
> +
> +static void __init lockup_detector_delay_init(struct work_struct *work)
> +{
> +	int ret;
> +
> +	wait_event(hld_detector_wait, hld_detector_delay_initialized);
> +	ret = watchdog_nmi_probe();
> +	if (!ret) {
> +		nmi_watchdog_available = true;
> +		lockup_detector_setup();

Is it really safe to call the entire lockup_detector_setup()
later?

It manipulates also softlockup detector. And more importantly,
the original call is before smp_init(). It means that it was
running when only single CPU was on.

It seems that x86 has some problem with hardlockup detector as
well. It later manipulates only the hardlockup detector. Also it uses
cpus_read_lock() to prevent races with CPU hotplug, see
fixup_ht_bug().


> +	} else {
> +		WARN_ON(ret == -EBUSY);
> +		pr_info("Perf NMI watchdog permanently disabled\n");
> +	}
> +}
> +
>  void __init lockup_detector_init(void)
>  {
> +	int ret;
> +
>  	if (tick_nohz_full_enabled())
>  		pr_info("Disabling watchdog on nohz_full cores by default\n");
>  
>  	cpumask_copy(&watchdog_cpumask,
>  		     housekeeping_cpumask(HK_FLAG_TIMER));
>  
> -	if (!watchdog_nmi_probe())
> +	ret = watchdog_nmi_probe();
> +	if (!ret)
>  		nmi_watchdog_available = true;
> +	else if (ret == -EBUSY)
> +		queue_work_on(smp_processor_id(), system_wq, &detector_work);

IMHO, this is not acceptable. It will block one worker until someone
wakes it. Only arm64 will have a code to wake up the work and only
when pmu is successfully initialized. In all other cases, the worker
will stay blocked forever.

The right solution is to do it the other way. Queue the work
from arm64-specific code when armv8_pmu_driver_init() succeeded.

Also I suggest to flush the work to make sure that it is finished
before __init code gets removed.


The open question is what code the work will call. As mentioned
above, I am not sure that lockup_detector_delay_init() is safe.
IMHO, we need to manipulate only hardlockup detector and
we have to serialize it against CPU hotplug.

Best Regards,
Petr