lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e80b3a66-35a3-1386-b35a-94837937956b@virtuozzo.com>
Date:   Tue, 11 Feb 2020 11:14:49 +0300
From:   Kirill Tkhai <ktkhai@...tuozzo.com>
To:     Konstantin Khlebnikov <khlebnikov@...dex-team.ru>,
        Petr Mladek <pmladek@...e.com>,
        Peter Zijlstra <peterz@...radead.org>,
        linux-kernel@...r.kernel.org, Steven Rostedt <rostedt@...dmis.org>,
        Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>
Cc:     Dmitry Monakhov <dmtrmonakhov@...dex-team.ru>
Subject: Re: [PATCH] kernel/watchdog: flush all printk nmi buffers when
 hardlockup detected

Hi, Konstantin,

On 10.02.2020 12:48, Konstantin Khlebnikov wrote:
> In NMI context printk() could save messages into per-cpu buffers and
> schedule flush by irq_work when IRQ are unblocked. This means message
> about hardlockup appears in kernel log only when/if lockup is gone.
> 
> Comment in irq_work_queue_on() states that remote IPI aren't NMI safe
> thus printk() cannot schedule flush work to another cpu.
> 
> This patch adds simple atomic counter of detected hardlockups and
> flushes all per-cpu printk buffers in context softlockup watchdog
> at any other cpu when it sees changes of this counter.
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@...dex-team.ru>
> ---
>  include/linux/nmi.h   |    1 +
>  kernel/watchdog.c     |   22 ++++++++++++++++++++++
>  kernel/watchdog_hld.c |    1 +
>  3 files changed, 24 insertions(+)
> 
> diff --git a/include/linux/nmi.h b/include/linux/nmi.h
> index 9003e29cde46..8406df72ae5a 100644
> --- a/include/linux/nmi.h
> +++ b/include/linux/nmi.h
> @@ -84,6 +84,7 @@ static inline void reset_hung_task_detector(void) { }
>  #if defined(CONFIG_HARDLOCKUP_DETECTOR)
>  extern void hardlockup_detector_disable(void);
>  extern unsigned int hardlockup_panic;
> +extern atomic_t hardlockup_detected;
>  #else
>  static inline void hardlockup_detector_disable(void) {}
>  #endif
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index b6b1f54a7837..9f5c68fababe 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -92,6 +92,26 @@ static int __init hardlockup_all_cpu_backtrace_setup(char *str)
>  }
>  __setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
>  # endif /* CONFIG_SMP */
> +
> +atomic_t hardlockup_detected = ATOMIC_INIT(0);
> +
> +static inline void flush_hardlockup_messages(void)
> +{
> +	static atomic_t flushed = ATOMIC_INIT(0);
> +
> +	/* flush messages from hard lockup detector */
> +	if (atomic_read(&hardlockup_detected) != atomic_read(&flushed)) {
> +		atomic_set(&flushed, atomic_read(&hardlockup_detected));
> +		printk_safe_flush();
> +	}
> +}

Do we really need two variables here? They may come into two different
cache lines, and there will be double cache pollution just because of
this simple check. Why not the below?

	if (atomic_read(&hardlockup_detected)) {
		atomic_set(&hardlockup_detected, 0);
		printk_safe_flush();
	}

Or even, since atomic is not needed here, as it does not give any ordering guarantees.
static inline void flush_hardlockup_messages(void)
{
	if (READ_ONCE(&hardlockup_detected)) {
		WRITE_ONCE(&hardlockup_detected, 0);
		printk_safe_flush();
	}
}

watchdog_timer_fn()
{
	...
	WRITE_ONCE(&hardlockup_detected, 1);
	...
}

Kirill

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ