linux-kernel - Re: [PATCH v3 2/2] Handle flushing of CPU backtraces during panic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20240815104356.49273-1-takakura@valinux.co.jp>
Date: Thu, 15 Aug 2024 19:43:56 +0900
From: takakura@...inux.co.jp
To: pmladek@...e.com,
	john.ogness@...utronix.de
Cc: akpm@...ux-foundation.org,
	bhe@...hat.com,
	feng.tang@...el.com,
	j.granados@...sung.com,
	linux-kernel@...r.kernel.org,
	lukas@...ner.de,
	nishimura@...inux.co.jp,
	rostedt@...dmis.org,
	senozhatsky@...omium.org,
	stephen.s.brennan@...cle.com,
	taka@...inux.co.jp,
	takakura@...inux.co.jp,
	ubizjak@...il.com,
	wangkefeng.wang@...wei.com
Subject: Re: [PATCH v3 2/2] Handle flushing of CPU backtraces during panic

Hi Petr and John,

On Tue 2024-08-13 13:45, Petr Mladek wrote:
>On Mon 2024-08-12 16:29:31, takakura@...inux.co.jp wrote:
>> From: Ryo Takakura <takakura@...inux.co.jp>
>> 
>> After panic, non-panicked CPU's has been unable to flush ringbuffer
>> while they can still write into it. This can affect CPU backtrace
>> triggered in panic only able to write into ringbuffer incapable of
>> flushing them.
>
>I still think about this. The motivation is basically the same
>as in the commit 5d5e4522a7f404d1a96fd ("printk: restore flushing
>of NMI buffers on remote CPUs after NMI backtraces").
>
>It would make sense to replace printk_trigger_flush() with
>console_try_flush(). And the function should queue the irq
>work when it can't be flushed directly, see below.

Yes, I agree.

>> Fix the issue by letting the panicked CPU handle the flushing of
>> ringbuffer right after non-panicked CPUs finished writing their
>> backtraces.
>> 
>> --- a/kernel/panic.c
>> +++ b/kernel/panic.c
>> @@ -260,6 +260,7 @@ static void panic_other_cpus_shutdown(bool crash_kexec)
>>  		panic_triggering_all_cpu_backtrace = true;
>>  		trigger_all_cpu_backtrace();
>>  		panic_triggering_all_cpu_backtrace = false;
>> +		console_try_flush();
>>  	}
>> 
>>  	/*
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -3284,6 +3284,20 @@ void console_flush_on_panic(enum con_flush_mode mode)
>>  	console_flush_all(false, &next_seq, &handover);
>>  }
>> 
>> +/**
>> + * console_try_flush - try to flush consoles when safe
>> + *
>> + * Context: Any, except for NMI.
>
>It is safe even in NMI. is_printk_legacy_deferred() would return true
>in this case.

I'm sorry... I didn't notice this. Thanks.

>> + */
>> +void console_try_flush(void)
>> +{
>> +	if (is_printk_legacy_deferred())
>> +		return;
>> +
>> +	if (console_trylock())
>> +		console_unlock();
>> +}
>
>I would do something like:
>
>/**
> * console_try_or_trigger_flush - try to flush consoles directly when
> *	safe or the trigger deferred flush.
> *
> * Context: Any
> */
>void console_try_or_trigger_flush(void)
>{
>	if (!is_printk_legacy_deferred() && console_trylock())
>		console_unlock();
>	else
>		defer_console_output();
>}
>
>and use it instead of printk_trigger_flush() in
>nmi_trigger_cpumask_backtrace().

Yes, that sounds better to me as well!

>Well, I would postpone this patch after we finalize the patchset
>adding con->write_atomic() callback. This patch depends on it anyway
>via is_printk_legacy_deferred(). The patchset might also add
>other wrappers for flushing consoles and we have to choose some
>reasonable names. Or John could integrate this patch into the patchset.

Sure, I can resend one once the patchset is finalized!
Or would it be better if I just leave it to John so that it can be part of 
the patchset?

>Best Regards,
>Petr

Sincerely,
Ryo Takakura