[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <87cyajkk2u.ffs@tglx>
Date: Wed, 02 Jul 2025 11:39:37 +0200
From: Thomas Gleixner <tglx@...utronix.de>
To: Gabriele Monaco <gmonaco@...hat.com>, Peter Zijlstra <peterz@...radead.org>
Cc: linux-kernel@...r.kernel.org, Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will@...nel.org>, Andy
Lutomirski <luto@...nel.org>, Steven Rostedt <rostedt@...dmis.org>, Masami
Hiramatsu <mhiramat@...nel.org>, Ingo Molnar <mingo@...nel.org>, Mark
Rutland <mark.rutland@....com>, linux-arm-kernel@...ts.infradead.org,
linux-trace-kernel@...r.kernel.org
Subject: Re: [PATCH v2] tracing: Fix inconsistency in irq tracking on NMIs
On Wed, Jul 02 2025 at 09:18, Gabriele Monaco wrote:
> On Tue, 2025-07-01 at 14:54 +0200, Peter Zijlstra wrote:
> I could probably only fix this without even considering NMIs when
> interrupts are disabled, but I believe if that happens, the tracepoints
> would report something wrong, since using tracing_irq_cpu alone does:
>
> local_irq_disable -> trace_irq_off
> nmi_enter -> no trace
> nmi_exit -> trace_irq_on
> // here interrupts are still off, aren't they?
> local_irq_enable -> no trace
>
> The idea that I described poorly, was to use tracing_irq_cpu in a way
> that the first context disabling interrupts fires the tracepoint
> (current behaviour), but when it's time to enable interrupts, an NMI
> which didn't disable interrupts shouldn't trace but let the next
> context trace.
...
> [1] - https://lore.kernel.org/lkml/87sejup1fe.ffs@tglx
As I told you before:
"As you correctly described, the two states are asynchronous in the
context of NMIs, so the obvious thing is to treat them as seperate
entities so that the trace really contains the off/on events
independent of lockdep enablement and lockdep's internal state. No?"
So I would have expected that you look at the lockdep implementation and
figure out why that one works correctly. Let's have a look at it:
nmi_entry()
irq_state.lockdep = lockdep_hardirqs_enabled();
lockdep_hardirqs_off(CALLER_ADDR0);
...
return irq_state;
nmi_exit(irq_state)
if (irq_state.lockdep)
lockdep_hardirqs_on(CALLER_ADDR0);
On NMI entry the internal state of lockdep tracking is recorded and
lockdep_hardirqs_off() handles a redundant off gracefully. As I said it
might be arguable to invoke it only when irq_state.lockdep == true, but
that's a cosmetic or performance issue not a correctness problem.
On NMI exit lockdep_hardirqs_on() is only invoked when the lockdep internal
state at entry was 'enabled' so it can restore the state correctly.
If you model the tracer exactly the same way:
irq_state.lockdep = lockdep_hardirqs_enabled();
irq_state.tracer = tracer_hardirqs_enabled();
....
You get the idea...
That will be "correct" in terms of sequencing depending on your trace side
implementation:
trace_hardirqs_off()
if (trace_hardirqs_enabled()) {
trace_irqs_enabled = false; // A
emit_trace(OFF); // B
If the NMI hits before A, everything is fine because it will record a
ON->OFF transition on NMI entry and a OFF->ON transition on NMI
exit. Then the interrupted context records a ON->OFF transition again.
If it hits between A and B, then the NMI tracing wont do anything
because enabled state is already false, but the resulting trace
sequencing is messed up:
irqs on
...
nmi_entry
nmi_exit
irqs off
IOW, it misses the ON->OFF OFF->ON transitions in the NMI.
So you want to do it the other way round:
trace_hardirqs_off()
if (trace_hardirqs_enabled()) {
emit_trace(OFF); // A
trace_irqs_enabled = false; // B
If the NMI hits before A, everything is fine because it will record a
ON->OFF transition on NMI entry and a OFF->ON transition on NMI
exit. Then the interrupted context records a ON->OFF transition again.
If it hits between A and B then you get a duplicate
irqs on
...
irqs off
nmi_entry
irqs off
irqs on
nmi_exit
There is nothing you can do about it, but that's easy enough to filter
out, no?
I'm pretty sure you can figure out how that should be modeled on the
on() side of affairs.
Thanks,
tglx
Powered by blists - more mailing lists