lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20250620125112.33978-2-gmonaco@redhat.com>
Date: Fri, 20 Jun 2025 14:51:13 +0200
From: Gabriele Monaco <gmonaco@...hat.com>
To: linux-kernel@...r.kernel.org,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <peterz@...radead.org>,
	Andy Lutomirski <luto@...nel.org>,
	Ingo Molnar <mingo@...nel.org>
Cc: Gabriele Monaco <gmonaco@...hat.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Masami Hiramatsu <mhiramat@...nel.org>,
	linux-trace-kernel@...r.kernel.org
Subject: [PATCH] lockdep: Fix inconsistency in irq tracking on NMIs

The irq_enable/irq_disable tracepoints fire only when there's an actual
transition (enabled->disabled and vice versa), this needs special care
in NMIs, as they could potentially start with IRQs already disabled.
The current implementation takes care of this by tracking the lockdep
state before the NMI (on nmi_entry) and not tracing on nmi_exit in case
IRQs were already disabled, we don't trace on nmi_entry following the
tracing_irq_cpu variable, which can be racy:

 local_irq_enable()
   void trace_hardirqs_on(void)
   {
   	if (tracing_irq_cpu) {
   		trace(irq_enable);
   		tracing_irq_cpu = 0;
   	}

                /*
                 * NMI here
                 * tracing_irq_cpu == 0 (done tracing)
                 * lockdep_hardirqs_enabled == 0 (IRQs still disabled)
                 */

                                   irqentry_nmi_enter()
                                       irq_state.lockdep = 0
                                       trace(irq_disable);

                                   irqentry_nmi_exit()
                                       // irq_state.lockdep == 0
                                       // do not trace(irq_enable)
   	lockdep_hardirqs_on();
   }

The error is visible with the sncid RV monitor and particularly likely
on machines with the following setup:
- x86 bare-metal with 40+ CPUs
- tuned throughput-performance (activating regular perf NMIs)
- workload: stress-ng --cpu-sched 21 --timer 11 --signal 11

The presence of the RV monitor is useful to see the error but it is not
necessary to trigger it.

Prevent this scenario by checking lockdep_hardirqs_enabled to trace also
on nmi_entry.

Fixes: ba1f2b2eaa2a ("x86/entry: Fix NMI vs IRQ state tracking")
Cc: Steven Rostedt <rostedt@...dmis.org>
Cc: Masami Hiramatsu <mhiramat@...nel.org>
Cc: linux-trace-kernel@...r.kernel.org
Signed-off-by: Gabriele Monaco <gmonaco@...hat.com>
---
 kernel/entry/common.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index a8dd1f27417cf..7369132c00ba4 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -326,13 +326,15 @@ irqentry_state_t noinstr irqentry_nmi_enter(struct pt_regs *regs)
 	irq_state.lockdep = lockdep_hardirqs_enabled();
 
 	__nmi_enter();
-	lockdep_hardirqs_off(CALLER_ADDR0);
+	if (irq_state.lockdep)
+		lockdep_hardirqs_off(CALLER_ADDR0);
 	lockdep_hardirq_enter();
 	ct_nmi_enter();
 
 	instrumentation_begin();
 	kmsan_unpoison_entry_regs(regs);
-	trace_hardirqs_off_finish();
+	if (irq_state.lockdep)
+		trace_hardirqs_off_finish();
 	ftrace_nmi_enter();
 	instrumentation_end();
 

base-commit: 75f5f23f8787c5e184fcb2fbcd02d8e9317dc5e7
-- 
2.49.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ