linux-kernel - Re: [PATCH v2 1/2] Handle flushing of CPU backtraces during panic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20240807085647.92164-1-takakura@valinux.co.jp>
Date: Wed,  7 Aug 2024 17:56:47 +0900
From: takakura@...inux.co.jp
To: pmladek@...e.com,
	john.ogness@...utronix.de
Cc: akpm@...ux-foundation.org,
	bhe@...hat.com,
	feng.tang@...el.com,
	j.granados@...sung.com,
	linux-kernel@...r.kernel.org,
	lukas@...ner.de,
	nishimura@...inux.co.jp,
	rostedt@...dmis.org,
	senozhatsky@...omium.org,
	stephen.s.brennan@...cle.com,
	taka@...inux.co.jp,
	takakura@...inux.co.jp,
	ubizjak@...il.com,
	wangkefeng.wang@...wei.com
Subject: Re: [PATCH v2 1/2] Handle flushing of CPU backtraces during panic

Hi Petr and John,

On 2024-08-05, Petr Mladek wrote:
>On Sat 2024-08-03 17:12:30, takakura@...inux.co.jp wrote:
>> From: Ryo Takakura <takakura@...inux.co.jp>
>> 
>> After panic, non-panicked CPU's has been unable to flush ringbuffer 
>> while they can still write into it. This can affect CPU backtrace 
>> triggered in panic only able to write into ringbuffer incapable of 
>> flushing them.
>> 
>> Fix the issue by letting the panicked CPU handle the flushing of 
>> ringbuffer right after non-panicked CPUs finished writing their
>> backtraces.
>> 
>> Signed-off-by: Ryo Takakura <takakura@...inux.co.jp>
>> ---
>>  kernel/panic.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>> 
>> diff --git a/kernel/panic.c b/kernel/panic.c
>> index 7e2070925..f94923a63 100644
>> --- a/kernel/panic.c
>> +++ b/kernel/panic.c
>> @@ -252,8 +252,10 @@ void check_panic_on_warn(const char *origin)
>>   */
>>  static void panic_other_cpus_shutdown(bool crash_kexec)
>>  {
>> -	if (panic_print & PANIC_PRINT_ALL_CPU_BT)
>> +	if (panic_print & PANIC_PRINT_ALL_CPU_BT) {
>>  		trigger_all_cpu_backtrace();
>> +		console_flush_on_panic(CONSOLE_FLUSH_PENDING);
>
>Hmm, this is too dangerous.
>
>console_flush_on_panic() is supposed to be called at the end on
>panic() as the final desperate attempt to flush consoles.
>It does not take console_lock(). It must not be called before
>stopping non-panic() CPUs.

I see, yes, it is dangerous...  Thanks for pointing this out!

Just out of curiosity, if we only had nbcon consoles which disables 
acquiring console lock after panic as pointed out by John on [0], 
I guess in that case we will be able to call
console_flush_on_panic(CONSOLE_FLUSH_PENDING) in this context.

>We would need to implement something like:
>
>/**
> * console_try_flush - try to flush consoles when safe
> *
> * Context: Any, except for NMI.
> */
>void console_try_flush(void)
>{
>	if (is_printk_legacy_deferred())
>		return;
>
>	if (console_trylock())
>		console_unlock();
>}
>
>, where is_printk_legacy_deferred() is not yet in the mainline. It is a new
>API proposed by the last version of a patchset adding adding write_atomic() callback,
>see https://lore.kernel.org/all/20240804005138.3722656-24-john.ogness@linutronix.de/

Also thanks for potinting the direction.
I'll send the next version with console_try_flush() for flushing backtraces 
as suggested!

>Best Regards,
>Petr

Sincerely,
Ryo Takakura

[0] https://lore.kernel.org/lkml/875xsofl7i.fsf@jogness.linutronix.de/