linux-kernel - Re: [PATCH v4] panic: Fixes the panic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b04fc583-c3d1-8c3d-3831-9c765a74a705@igalia.com>
Date:   Wed, 15 Feb 2023 12:39:22 -0300
From:   "Guilherme G. Piccoli" <gpiccoli@...lia.com>
To:     Petr Mladek <pmladek@...e.com>
Cc:     akpm@...ux-foundation.org, bhe@...hat.com,
        linux-kernel@...r.kernel.org, kexec@...ts.infradead.org,
        dyoung@...hat.com, d.hatayama@...fujitsu.com, feng.tang@...el.com,
        hidehiro.kawai.ez@...achi.com, keescook@...omium.org,
        mikelley@...rosoft.com, vgoyal@...hat.com, kernel-dev@...lia.com,
        kernel@...ccoli.net, stable@...r.kernel.org
Subject: Re: [PATCH v4] panic: Fixes the panic_print NMI backtrace setting

On 14/02/2023 11:46, Petr Mladek wrote:
> [...]
>> My understanding is that it's a mechanism to prevent some concurrency,
>> in case some other CPU modify this variable while panic() is running.
>> I find it very unlikely, hence I removed it - but if people consider
>> this copy needed, I can respin this patch and keep it, even providing a
>> comment about that, in order to be explict about its need.
> 
> Yes, I think that it makes the behavior consistent even when the
> global variable manipulated in parallel.
> 
> I would personally prefer to keep the local copy. Better safe
> than sorry.
> 

Hi Petr, thanks for your review!
OK, we could keep this local copy, makes sense...even adding a comment,
to make its purpose really clear.

>> [...]
>> @@ -211,9 +211,6 @@ static void panic_print_sys_info(bool console_flush)
>>  		return;
>>  	}
>>  
>> -	if (panic_print & PANIC_PRINT_ALL_CPU_BT)
>> -		trigger_all_cpu_backtrace();
>> -
> 
> Sigh, this is yet another PANIC_PRINT_ action that need special
> timing. We should handle both the same way.
> 
> What about the following? The parameter @mask says what
> actions are allowed at the given time.
> < ..code..> 

I think your approach is interesting, it's very "organized".

But I think it's a bit conflicting with that purpose we had on notifiers
refactor, to deprecate "bogus" usages of panic_print, as in
https://lore.kernel.org/lkml/20220427224924.592546-26-gpiccoli@igalia.com/ .

So, the idea of my approach is to allow:

(a) Easy removal of panic_print_sys_info() of panic(), once we move it
to a panic notifier;

(b) Better separate and identify the "bogus" cases. The CPU backtrace
one is less a bogus case in my opinion, more a "complicated" one, since
it's related with the CPUs stop routines. But the console flush, as we
discussed, it's clearly something that calls for a new parameter (and
such param was added in the refactor patch).

In the end, I think your approach is interesting but it's kinda like
we're adding the fix to later, on refactor, entirely remove/rework it.
With my approach we wouldn't be calling panic_print_sys_info() again
(3rd time!) on panic(), and also would be more natural to move it later
to a new panic notifier.

What you / others think? If your approach is in the end preferred, it's
fine by me - I'd just ask you to submit as a full patch so we can get it
merged as a fix in 6.3, if possible (and backport it to the 6.1/6.2
stable). Now, if my approach is fine, I can resubmit as a V5 keeping the
local variable - lemme know.

Cheers,

Guilherme