linux-kernel - Re: [PATCH] kernel/watchdog: flush all printk nmi buffers when hardlockup detected

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <5b00e7f5-385d-bbc8-886a-d09cc844d07d@yandex-team.ru>
Date:   Thu, 13 Feb 2020 16:05:17 +0300
From:   Konstantin Khlebnikov <khlebnikov@...dex-team.ru>
To:     Petr Mladek <pmladek@...e.com>
Cc:     Kirill Tkhai <ktkhai@...tuozzo.com>,
        Peter Zijlstra <peterz@...radead.org>,
        linux-kernel@...r.kernel.org, Steven Rostedt <rostedt@...dmis.org>,
        Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Dmitry Monakhov <dmtrmonakhov@...dex-team.ru>
Subject: Re: [PATCH] kernel/watchdog: flush all printk nmi buffers when
 hardlockup detected



On 12/02/2020 17.54, Petr Mladek wrote:
> On Tue 2020-02-11 15:36:02, Konstantin Khlebnikov wrote:
>> On 11/02/2020 11.14, Kirill Tkhai wrote:
>>> Hi, Konstantin,
>>>
>>> On 10.02.2020 12:48, Konstantin Khlebnikov wrote:
>>>> In NMI context printk() could save messages into per-cpu buffers and
>>>> schedule flush by irq_work when IRQ are unblocked. This means message
>>>> about hardlockup appears in kernel log only when/if lockup is gone.
>>>>
>>>> Comment in irq_work_queue_on() states that remote IPI aren't NMI safe
>>>> thus printk() cannot schedule flush work to another cpu.
>>>>
>>>> This patch adds simple atomic counter of detected hardlockups and
>>>> flushes all per-cpu printk buffers in context softlockup watchdog
>>>> at any other cpu when it sees changes of this counter.
>>>>
>>>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@...dex-team.ru>
>>>> ---
>>>>    include/linux/nmi.h   |    1 +
>>>>    kernel/watchdog.c     |   22 ++++++++++++++++++++++
>>>>    kernel/watchdog_hld.c |    1 +
>>>>    3 files changed, 24 insertions(+)
>>>>
>>>> diff --git a/include/linux/nmi.h b/include/linux/nmi.h
>>>> index 9003e29cde46..8406df72ae5a 100644
>>>> --- a/include/linux/nmi.h
>>>> +++ b/include/linux/nmi.h
>>>> @@ -84,6 +84,7 @@ static inline void reset_hung_task_detector(void) { }
>>>>    #if defined(CONFIG_HARDLOCKUP_DETECTOR)
>>>>    extern void hardlockup_detector_disable(void);
>>>>    extern unsigned int hardlockup_panic;
>>>> +extern atomic_t hardlockup_detected;
>>>>    #else
>>>>    static inline void hardlockup_detector_disable(void) {}
>>>>    #endif
>>>> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
>>>> index b6b1f54a7837..9f5c68fababe 100644
>>>> --- a/kernel/watchdog.c
>>>> +++ b/kernel/watchdog.c
>>>> @@ -92,6 +92,26 @@ static int __init hardlockup_all_cpu_backtrace_setup(char *str)
>>>>    }
>>>>    __setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
>>>>    # endif /* CONFIG_SMP */
>>>> +
>>>> +atomic_t hardlockup_detected = ATOMIC_INIT(0);
>>>> +
>>>> +static inline void flush_hardlockup_messages(void)
>>>> +{
>>>> +	static atomic_t flushed = ATOMIC_INIT(0);
>>>> +
>>>> +	/* flush messages from hard lockup detector */
>>>> +	if (atomic_read(&hardlockup_detected) != atomic_read(&flushed)) {
>>>> +		atomic_set(&flushed, atomic_read(&hardlockup_detected));
>>>> +		printk_safe_flush();
>>>> +	}
>>>> +}
>>>
>>> Do we really need two variables here? They may come into two different
>>> cache lines, and there will be double cache pollution just because of
>>> this simple check. Why not the below?
>>
>> I don't think anybody could notice read-only access to second variable.
>> This executes once in several seconds.
>>
>> Watchdogs already use same pattern (monotonic counter + snapshot) in
>> couple places. So code looks more clean in this way.
> 
> It is not only about speed. It is also about code complexity
> and correctness. Using two variables is more complex.
> Calling atomic_read(&hardlockup_detected) twice does > not look like a correct pattern.

It guarantees "at least once" which is enough for this case.

> 
> The single variable patter is used for similar things there
> as well, for example, see hard_watchdog_warn,

> hardlockup_allcpu_dumped.

Ouch. This works only once and there is no way to rearm it back.
Now I see why this thing never worked for me recent years =)

Maybe it's better reset sysctl_hardlockup_all_cpu_backtrace
to let user set sysctl back to 1.
Or rearm it back in softlockup watchdog after timeout.

> 
> I would call the variable "hardlockup_dump_flush" and
> use the same logic as for hardlockup_allcpu_dumped.
> 
> Note that simple READ_ONCE(), WRITE_ONCE() are not enough
> because they do not provide smp barriers.

It's hard to imagine arch which actually needs barrires here.
Softlockup timers will eventually see the change.

> 
> Best Regards,
> Petr
>