linux-kernel - Re: [PATCH v2] panic, kexec: Make __crash

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <xhsmhpmiu5lch.mognet@vschneid.remote.csb>
Date:   Mon, 27 Jun 2022 13:42:22 +0100
From:   Valentin Schneider <vschneid@...hat.com>
To:     "Eric W. Biederman" <ebiederm@...ssion.com>
Cc:     linux-kernel@...r.kernel.org, kexec@...ts.infradead.org,
        linux-rt-users@...r.kernel.org, Arnd Bergmann <arnd@...db.de>,
        Petr Mladek <pmladek@...e.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Juri Lelli <jlelli@...hat.com>,
        "Luis Claudio R. Goncalves" <lgoncalv@...hat.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Vivek Goyal <vgoyal@...hat.com>
Subject: Re: [PATCH v2] panic, kexec: Make __crash_kexec() NMI safe

On 25/06/22 12:04, Eric W. Biederman wrote:
> I am not particularly fond of this patch as it adds more complexity than
> is necessary to solve the problem.
>
> Calling a spade a spade PREEMPT_RT's mutex_trylock implementation is
> broken as it can not support the use cases of an ordinary mutex_trylock.
> I have not seen (possibly I skimmed too quickly) anywhere in the
> discussion why PREEMPT_RT is not being fixed.  Looking at the code
> there is enough going on in try_to_take_rt_mutex that I can imagine
> that some part of that code is not nmi safe.  So I can believe
> PREEMPT_RT may be unfix-ably broken.
>

AFAICT same goes for !PREEMPT_RT given the mutex_unlock(); it's a bit
convoluted but you can craft scenarios where the NMI ends up spinning on
mutex->wait_lock that is owned by the interrupted task, e.g.

  CPU0                    CPU1

                          crash_shrink_memory()
                            mutex_lock();
  crash_get_memory_size()
    mutex_lock()
      raw_spin_lock(&lock->wait_lock);
      // Lock acquired
  <NMI>
                            mutex_unlock()
                              <Release lock->owner>;

  // Owner is free at this point so this succeeds
  mutex_trylock();
  // No kexec_crash_image
  mutex_unlock()
    raw_spin_lock(&lock->wait_lock);

>
> At this point I recommend going back to being ``unconventional'' with
> the kexec locking and effectively reverting commit 8c5a1cf0ad3a ("kexec:
> use a mutex for locking rather than xchg()").
>
> That would also mean that we don't have to worry about the lockdep code
> doing something weird in the future and breaking kexec.
>
> Your change starting to is atomic_cmpxchng is most halfway to a revert
> of commit 8c5a1cf0ad3a ("kexec: use a mutex for locking rather than
> xchg()").  So we might as well go the whole way and just document that
> the kexec on panic code can not use conventional kernel locking
> primitives and has to dig deep and build it's own.  At which point it
> makes no sense for the rest of the kexec code to use anything different.
>

Hm, I'm a bit torn about that one, ideally I'd prefer to keep "homegrown"
locking primitives to just where they are needed (loading & kexec'ing), but
I'm also not immensely fond of the "hybrid" mutex+cmpxchg approach.

> Eric