linux-kernel - Re: [PATCH] panic, kexec: Don't mutex_trylock() in __crash

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <xhsmha6ab6zm3.mognet@vschneid.remote.csb>
Date:   Fri, 17 Jun 2022 17:09:24 +0100
From:   Valentin Schneider <vschneid@...hat.com>
To:     Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc:     linux-kernel@...r.kernel.org, kexec@...ts.infradead.org,
        linux-rt-users@...r.kernel.org,
        Eric Biederman <ebiederm@...ssion.com>,
        Arnd Bergmann <arnd@...db.de>, Petr Mladek <pmladek@...e.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Juri Lelli <jlelli@...hat.com>,
        "Luis Claudio R. Goncalves" <lgoncalv@...hat.com>
Subject: Re: [PATCH] panic, kexec: Don't mutex_trylock() in __crash_kexec()

On 17/06/22 17:13, Sebastian Andrzej Siewior wrote:
> On 2022-06-16 13:37:09 [+0100], Valentin Schneider wrote:
>> Regarding the original explanation for the WARN & return:
>> 
>> I don't get why 2) is a problem - if the lock is acquired by the trylock
>> then the critical section will be run without interruption since it
>> cannot sleep, the interrupted task may get boosted but that will not
>> have any actual impact AFAICT.
>
> boosting an unrelated task is considered wrong. I don't know how bad
> it gets in terms of lock chains since a task is set as owner which did
> not actually ask for the lock.
>
>> Regardless, even if this doesn't sleep, the ->wait_lock in the slowpath
>> isn't NMI safe so this needs changing.
>
> This includes the unlock path which may wake a waiter and deboost.
>

Both are good points, thank you for lighting my lantern :)

>> I've thought about trying to defer the kexec out of an NMI (or IRQ)
>> context, but that pretty much means deferring the panic() which I'm
>> not sure is such a great idea.
>
> If we could defer it out of NMI on RT then it would work non-RT, too. If
> the system is "stuck" and the NMI is the only to respond then I guess
> that it is not a great idea.
>

Those were pretty much my thoughts. I *think* panic() can be re-entrant on
the same CPU if the first entry was from NMI, but that still requires being
able to schedule a thread that panics which isn't a given after getting
that panic NMI. So for now actually doing the kexec in NMI (or IRQ) context
seems to be the less hazardous route. 

> Sebastian