lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <DCF269A8-4C64-4FDE-AFAC-92B6029EA3BA@fb.com>
Date:   Fri, 2 Jun 2023 23:15:14 +0000
From:   Song Liu <songliubraving@...a.com>
To:     Peter Zijlstra <peterz@...radead.org>
CC:     Song Liu <song@...nel.org>, LKML <linux-kernel@...r.kernel.org>,
        Kernel Team <kernel-team@...a.com>,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH v4] watchdog: Allow nmi watchdog to use "ref-cycles" event



> On Jun 2, 2023, at 3:47 PM, Peter Zijlstra <peterz@...radead.org> wrote:
> 
> On Wed, May 17, 2023 at 05:25:55PM -0700, Song Liu wrote:
>> NMI watchdog permanently consumes one hardware counters per CPU on the
>> system. For systems that use many hardware counters, this causes more
>> aggressive time multiplexing of perf events.
>> 
>> OTOH, some CPUs (mostly Intel) support "ref-cycles" event, which is rarely
>> used. Add kernel cmdline arg nmi_watchdog=ref-cycles to configure the
>> watchdog to use "ref-cycles" event instead of "cycles".
>> 
>> Cc: Andrew Morton <akpm@...ux-foundation.org>
>> Cc: Peter Zijlstra <peterz@...radead.org>
>> Signed-off-by: Song Liu <song@...nel.org>
>> 
>> ---
>> Changes in v4:
>> Fix compile error for !CONFIG_HARDLOCKUP_DETECTOR_PERF. (kernel test bot)
>> 
>> Changes in v3:
>> 
>> Pivot the design to use kernel arg nmi_watchdog=ref-cycles (Peter)
>> ---
>> Documentation/admin-guide/kernel-parameters.txt | 5 +++--
>> include/linux/nmi.h                             | 2 ++
>> kernel/watchdog.c                               | 2 ++
>> kernel/watchdog_hld.c                           | 9 +++++++++
>> 4 files changed, 16 insertions(+), 2 deletions(-)
>> 
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index 9e5bab29685f..d378e23dad7c 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -3593,10 +3593,12 @@
>> Format: [state][,regs][,debounce][,die]
>> 
>> nmi_watchdog= [KNL,BUGS=X86] Debugging features for SMP kernels
>> - Format: [panic,][nopanic,][num]
>> + Format: [panic,][nopanic,][ref-cycles][num]
>> Valid num: 0 or 1
>> 0 - turn hardlockup detector in nmi_watchdog off
>> 1 - turn hardlockup detector in nmi_watchdog on
>> + ref-cycles - configure the watchdog with perf event
>> +             "ref-cycles" instead of "cycles"
>> When panic is specified, panic when an NMI watchdog
>> timeout occurs (or 'nopanic' to not panic on an NMI
>> watchdog, if CONFIG_BOOTPARAM_HARDLOCKUP_PANIC is set)
> 
> I still hate the whole ref-cycles thing, at the very least powerpc also
> has HAVE_HARDLOCKUP_DETECTOR_PERF and they don't have ref-cycles, but
> perhaps them wants to use a different event when the moon is just so...
> 
> What again was wrong with the option of specifying a raw event value and
> falling back to cpu-cycles if that fails?

The same raw event number may mean different events on different hardware. 
So it is more likely to make mistakes in configurations. For example, r300 
means ref-cycles on Intel CPUs, but it also means something else on AMD 
CPUs. I need to be very careful which hosts to run with nmi_watchdog=r300, 
as it may cause surprises. OTOH, nmi_watchdog=ref-cycles won't have this 
issue. Of course, this won't work for powerpc. 

Does this make sense?

Thanks,
Song

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ