lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Sat, 23 Sep 2023 08:04:49 +0800
From:   Baoquan He <bhe@...hat.com>
To:     Eric DeVolder <eric.devolder@...cle.com>
Cc:     Valentin Schneider <vschneid@...hat.com>,
        linux-kernel@...r.kernel.org, vgoyal@...hat.com, dyoung@...hat.com,
        ebiederm@...ssion.com, kexec@...ts.infradead.org,
        sourabhjain@...ux.ibm.com, konrad.wilk@...cle.com,
        boris.ostrovsky@...cle.com,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH] kexec: change locking mechanism to a mutex

On 09/22/23 at 12:35pm, Eric DeVolder wrote:
> 
> 
> On 9/22/23 11:28, Valentin Schneider wrote:
> > On 21/09/23 17:59, Eric DeVolder wrote:
> > > The design decision to use the atomic lock is described in the comment
> > > from kexec_internal.h, cited above. However, examining the code of
> > > __crash_kexec():
> > > 
> > >          if (kexec_trylock()) {
> > >                  if (kexec_crash_image) {
> > >                          ...
> > >                  }
> > >                  kexec_unlock();
> > >          }
> > > 
> > > reveals that the use of kexec_trylock() here is actually a "best effort"
> > > due to the atomic lock.  This atomic lock, prior to crash hotplug,
> > > would almost always be assured (another kexec syscall could hold the lock
> > > and prevent this, but that is about it).
> > > 
> > > So at the point where the capture kernel would be invoked, if the lock
> > > is not obtained, then kdump doesn't occur.
> > > 
> > > It is possible to instead use a mutex with proper waiting, and utilize
> > > mutex_trylock() as the "best effort" in __crash_kexec(). The use of a
> > > mutex then avoids all the lock acquisition problems that were revealed
> > > by the crash hotplug activity.
> > > 
> > 
> > @Dave thanks for the Cc, I'd have missed this otherwise.
> > 
> > 
> > Prior to the atomic thingie, we actually had a mutex and did
> > mutex_trylock() in __crash_kexec(). I'm a bit confused as this looks like a
> > revert of
> >    05c6257433b7 ("panic, kexec: make __crash_kexec() NMI safe")
> > with just the helpers kept in - this doesn't seem to address any of the
> > original issues regarding NMIs?
> > 
> > Sebastian raised some good points in [1] regarding these issues.
> > The main hurdle pointed out there is, if we end up in the slowpath during
> > the unlock, then we can can up acquiring the ->wait_lock which isn't NMI
> > safe.
> > 
> > This is even worse on PREEMPT_RT, as both trylock and the unlock can end up
> > acquiring the ->wait_lock.
> > 
> > [1]: https://lore.kernel.org/all/YqyZ%2FUf14qkYtMDX@linutronix.de/
> > 
> Having reviewed the references, it would seem that Baoquan's approach of a new
> lock to handle the hotplug activity is the way to go?

If so, I have posted a formal one. It's simple and should work to fix
the issue.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ