lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20241127233455.76904-1-longman@redhat.com>
Date: Wed, 27 Nov 2024 18:34:55 -0500
From: Waiman Long <longman@...hat.com>
To: Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>,
	Borislav Petkov <bp@...en8.de>,
	Dave Hansen <dave.hansen@...ux.intel.com>
Cc: x86@...nel.org,
	linux-kernel@...r.kernel.org,
	"H. Peter Anvin" <hpa@...or.com>,
	Waiman Long <longman@...hat.com>
Subject: [PATCH] x86/nmi: Use trylock in __register_nmi_handler() when in_nmi()

The __register_nmi_handler() function can be called in NMI context from
nmi_shootdown_cpus() leading to a lockdep splat like the following.

[ 1123.133573] ================================
[ 1123.137845] WARNING: inconsistent lock state
[ 1123.142118] 6.12.0-31.el10.x86_64+debug #1 Not tainted
[ 1123.147257] --------------------------------
[ 1123.151529] inconsistent {INITIAL USE} -> {IN-NMI} usage.
  :
[ 1123.261544]  Possible unsafe locking scenario:
[ 1123.261544]
[ 1123.267463]        CPU0
[ 1123.269915]        ----
[ 1123.272368]   lock(&nmi_desc[0].lock);
[ 1123.276122]   <Interrupt>
[ 1123.278746]     lock(&nmi_desc[0].lock);
[ 1123.282671]
[ 1123.282671]  *** DEADLOCK ***
  :
[ 1123.314088] Call Trace:
[ 1123.316542]  <NMI>
[ 1123.318562]  dump_stack_lvl+0x6f/0xb0
[ 1123.322230]  print_usage_bug.part.0+0x3d3/0x610
[ 1123.330618]  lock_acquire.part.0+0x2e6/0x360
[ 1123.357217]  _raw_spin_lock_irqsave+0x46/0x90
[ 1123.366193]  __register_nmi_handler+0x8f/0x3a0
[ 1123.374401]  nmi_shootdown_cpus+0x95/0x120
[ 1123.378509]  kdump_nmi_shootdown_cpus+0x15/0x20
[ 1123.383040]  native_machine_crash_shutdown+0x54/0x160
[ 1123.388095]  __crash_kexec+0x10f/0x1f0

In this particular case, a panic had just happened.

[ 1122.808188] Kernel panic - not syncing: Fatal hardware error!

It can be argued that a lockdep splat after a panic is not a big deal,
but it can still confuse users. Fix that by using trylock in NMI context
to avoid the lockdep splat. If trylock fails, we still need to acquire
the lock on the best effort basis to avoid potential NMI descriptor
list corruption.

Signed-off-by: Waiman Long <longman@...hat.com>
---
 arch/x86/kernel/nmi.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index ed163c8c8604..0b8ad64be117 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -171,8 +171,23 @@ int __register_nmi_handler(unsigned int type, struct nmiaction *action)
 	if (WARN_ON_ONCE(!action->handler || !list_empty(&action->list)))
 		return -EINVAL;
 
+	if (in_nmi()) {
+		/*
+		 * register_nmi_handler() can be called in NMI context from
+		 * nmi_shootdown_cpus(). In this case, we use trylock to
+		 * acquire the NMI descriptor lock to avoid potential lockdep
+		 * splat. If that fails, we still acquire the lock on the best
+		 * effort basis, but a real deadlock may happen if the CPU is
+		 * acquiring that lock when NMI happens.
+		 */
+		if (raw_spin_trylock_irqsave(&desc->lock, flags))
+			goto locked;
+		pr_err("%s: trylock of NMI desc lock failed in NMI context, may deadlock!\n",
+			__func__);
+	}
 	raw_spin_lock_irqsave(&desc->lock, flags);
 
+locked:
 	/*
 	 * Indicate if there are multiple registrations on the
 	 * internal NMI handler call chains (SERR and IO_CHECK).
-- 
2.47.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ