linux-kernel - Re: [PATCH v3 2/2] Handle flushing of CPU backtraces during panic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZrtjXChY_0wnFXsS@pathway.suse.cz>
Date: Tue, 13 Aug 2024 15:45:22 +0200
From: Petr Mladek <pmladek@...e.com>
To: takakura@...inux.co.jp
Cc: rostedt@...dmis.org, john.ogness@...utronix.de,
	senozhatsky@...omium.org, akpm@...ux-foundation.org, bhe@...hat.com,
	lukas@...ner.de, wangkefeng.wang@...wei.com, ubizjak@...il.com,
	feng.tang@...el.com, j.granados@...sung.com,
	stephen.s.brennan@...cle.com, linux-kernel@...r.kernel.org,
	nishimura@...inux.co.jp, taka@...inux.co.jp
Subject: Re: [PATCH v3 2/2] Handle flushing of CPU backtraces during panic

On Mon 2024-08-12 16:29:31, takakura@...inux.co.jp wrote:
> From: Ryo Takakura <takakura@...inux.co.jp>
> 
> After panic, non-panicked CPU's has been unable to flush ringbuffer
> while they can still write into it. This can affect CPU backtrace
> triggered in panic only able to write into ringbuffer incapable of
> flushing them.

I still think about this. The motivation is basically the same
as in the commit 5d5e4522a7f404d1a96fd ("printk: restore flushing
of NMI buffers on remote CPUs after NMI backtraces").

It would make sense to replace printk_trigger_flush() with
console_try_flush(). And the function should queue the irq
work when it can't be flushed directly, see below.

> Fix the issue by letting the panicked CPU handle the flushing of
> ringbuffer right after non-panicked CPUs finished writing their
> backtraces.
> 
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -260,6 +260,7 @@ static void panic_other_cpus_shutdown(bool crash_kexec)
>  		panic_triggering_all_cpu_backtrace = true;
>  		trigger_all_cpu_backtrace();
>  		panic_triggering_all_cpu_backtrace = false;
> +		console_try_flush();
>  	}
>  
>  	/*
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -3284,6 +3284,20 @@ void console_flush_on_panic(enum con_flush_mode mode)
>  	console_flush_all(false, &next_seq, &handover);
>  }
>  
> +/**
> + * console_try_flush - try to flush consoles when safe
> + *
> + * Context: Any, except for NMI.

It is safe even in NMI. is_printk_legacy_deferred() would return true
in this case.

> + */
> +void console_try_flush(void)
> +{
> +	if (is_printk_legacy_deferred())
> +		return;
> +
> +	if (console_trylock())
> +		console_unlock();
> +}

I would do something like:

/**
 * console_try_or_trigger_flush - try to flush consoles directly when
 *	safe or the trigger deferred flush.
 *
 * Context: Any
 */
void console_try_or_trigger_flush(void)
{
	if (!is_printk_legacy_deferred() && console_trylock())
		console_unlock();
	else
		defer_console_output();
}

and use it instead of printk_trigger_flush() in
nmi_trigger_cpumask_backtrace().


Well, I would postpone this patch after we finalize the patchset
adding con->write_atomic() callback. This patch depends on it anyway
via is_printk_legacy_deferred(). The patchset might also add
other wrappers for flushing consoles and we have to choose some
reasonable names. Or John could integrate this patch into the patchset.

Best Regards,
Petr