linux-kernel - Re: [PATCH] watchdog: Prefer use "ref-cycles" for NMI watchdog

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20230512164056.8f1e4e23032f7f7f5cb69df0@linux-foundation.org>
Date:   Fri, 12 May 2023 16:40:56 -0700
From:   Andrew Morton <akpm@...ux-foundation.org>
To:     Song Liu <song@...nel.org>
Cc:     <linux-kernel@...r.kernel.org>, <kernel-team@...a.com>,
        Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH] watchdog: Prefer use "ref-cycles" for NMI watchdog

On Tue, 9 May 2023 15:17:00 -0700 Song Liu <song@...nel.org> wrote:

> NMI watchdog permanently consumes one hardware counters per CPU on the
> system. For systems that use many hardware counters, this causes more
> aggressive time multiplexing of perf events.
> 
> OTOH, some CPUs (mostly Intel) support "ref-cycles" event, which is rarely
> used. Try use "ref-cycles" for the watchdog. If the CPU supports it, so
> that one more hardware counter is available to the user. If the CPU doesn't
> support "ref-cycles", fall back to "cycles".
> 
> The downside of this change is that users of "ref-cycles" need to disable
> nmi_watchdog.
> 
> ...
>
> @@ -286,6 +286,12 @@ int __init hardlockup_detector_perf_init(void)
>  {
>  	int ret = hardlockup_detector_event_create();
>  
> +	if (ret) {

If we get here, hardlockup_detector_event_create() has sent a scary
pr_debug message.

> +		/* Failed to create "ref-cycles", try "cycles" instead */
> +		wd_hw_attr.config = PERF_COUNT_HW_CPU_CYCLES;
> +		ret = hardlockup_detector_event_create();

So it would be good to emit a followup message here telling users that
things are OK.  Or tell the user we're retrying with a different
counter, etc.

> +		/* Failed to create "ref-cycles", try "cycles" instead */
> +		wd_hw_attr.config = PERF_COUNT_HW_CPU_CYCLES;
> +		ret = hardlockup_detector_event_create();
> +	}
> +
>  	if (ret) {
>  		pr_info("Perf NMI watchdog permanently disabled\n");
>  	} else {
> -- 
> 2.34.1