lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250701125459.GL1613200@noisy.programming.kicks-ass.net>
Date: Tue, 1 Jul 2025 14:54:59 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Gabriele Monaco <gmonaco@...hat.com>
Cc: linux-kernel@...r.kernel.org, Catalin Marinas <catalin.marinas@....com>,
	Will Deacon <will@...nel.org>, Thomas Gleixner <tglx@...utronix.de>,
	Andy Lutomirski <luto@...nel.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Masami Hiramatsu <mhiramat@...nel.org>,
	Ingo Molnar <mingo@...nel.org>, Mark Rutland <mark.rutland@....com>,
	linux-arm-kernel@...ts.infradead.org,
	linux-trace-kernel@...r.kernel.org
Subject: Re: [PATCH v2] tracing: Fix inconsistency in irq tracking on NMIs

On Wed, Jun 25, 2025 at 02:08:22PM +0200, Gabriele Monaco wrote:
> The irq_enable/irq_disable tracepoints fire only when there's an actual
> transition (enabled->disabled and vice versa), this needs special care
> in NMIs, as they can potentially start with interrupts already disabled.
> The current implementation takes care of this by tracking the lockdep
> state on nmi_entry as well as using the variable tracing_irq_cpu to
> synchronise with other calls (e.g. local_irq_disable/enable).
> 
> This can be racy in case of NMIs when lockdep is enabled, and can lead
> to missing events when lockdep is disabled.
> 
> Remove dependency on the lockdep status in the NMI common entry/exit
> code and adapt the tracing code to make sure that:
> 
> - The first call disabling interrupts fires the tracepoint
> - The first non-NMI call enabling interrupts fires the tracepoint
> - The last NMI call enabling interrupts fires the tracepoint unless
>   interrupts were disabled before the NMI
> - All other calls don't fire

I'm not at all convinced this is correct. Nor can I understand anything
much about what you wrote above.


>  arch/arm64/kernel/entry-common.c |  5 ++---
>  kernel/entry/common.c            |  5 ++---
>  kernel/trace/trace_preemptirq.c  | 12 +++++++-----
>  3 files changed, 11 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/arm64/kernel/entry-common.c b/arch/arm64/kernel/entry-common.c
> index 7c1970b341b8c..7f1844123642e 100644
> --- a/arch/arm64/kernel/entry-common.c
> +++ b/arch/arm64/kernel/entry-common.c
> @@ -213,10 +213,9 @@ static void noinstr arm64_exit_nmi(struct pt_regs *regs)
>  	bool restore = regs->lockdep_hardirqs;
>  
>  	ftrace_nmi_exit();
> -	if (restore) {
> -		trace_hardirqs_on_prepare();
> +	trace_hardirqs_on_prepare();
> +	if (restore)
>  		lockdep_hardirqs_on_prepare();
> -	}
>  
>  	ct_nmi_exit();
>  	lockdep_hardirq_exit();
> diff --git a/kernel/entry/common.c b/kernel/entry/common.c
> index a8dd1f27417cf..e234f264fb495 100644
> --- a/kernel/entry/common.c
> +++ b/kernel/entry/common.c
> @@ -343,10 +343,9 @@ void noinstr irqentry_nmi_exit(struct pt_regs *regs, irqentry_state_t irq_state)
>  {
>  	instrumentation_begin();
>  	ftrace_nmi_exit();
> -	if (irq_state.lockdep) {
> -		trace_hardirqs_on_prepare();
> +	trace_hardirqs_on_prepare();
> +	if (irq_state.lockdep)
>  		lockdep_hardirqs_on_prepare();
> -	}
>  	instrumentation_end();
>  
>  	ct_nmi_exit();
> diff --git a/kernel/trace/trace_preemptirq.c b/kernel/trace/trace_preemptirq.c
> index 0c42b15c38004..fa45474fc54f1 100644
> --- a/kernel/trace/trace_preemptirq.c
> +++ b/kernel/trace/trace_preemptirq.c
> @@ -58,7 +58,11 @@ static DEFINE_PER_CPU(int, tracing_irq_cpu);
>   */
>  void trace_hardirqs_on_prepare(void)
>  {
> -	if (this_cpu_read(tracing_irq_cpu)) {
> +	int tracing_count = this_cpu_read(tracing_irq_cpu);
> +
> +	if (in_nmi() && tracing_count > 1)
> +		this_cpu_dec(tracing_irq_cpu);
> +	else if (tracing_count) {
>  		trace(irq_enable, TP_ARGS(CALLER_ADDR0, CALLER_ADDR1));
>  		tracer_hardirqs_on(CALLER_ADDR0, CALLER_ADDR1);
>  		this_cpu_write(tracing_irq_cpu, 0);
> @@ -89,8 +93,7 @@ NOKPROBE_SYMBOL(trace_hardirqs_on);
>   */
>  void trace_hardirqs_off_finish(void)
>  {
> -	if (!this_cpu_read(tracing_irq_cpu)) {
> -		this_cpu_write(tracing_irq_cpu, 1);
> +	if (this_cpu_inc_return(tracing_irq_cpu) == 1) {
>  		tracer_hardirqs_off(CALLER_ADDR0, CALLER_ADDR1);
>  		trace(irq_disable, TP_ARGS(CALLER_ADDR0, CALLER_ADDR1));
>  	}
> @@ -103,8 +106,7 @@ void trace_hardirqs_off(void)
>  {
>  	lockdep_hardirqs_off(CALLER_ADDR0);
>  
> -	if (!this_cpu_read(tracing_irq_cpu)) {
> -		this_cpu_write(tracing_irq_cpu, 1);
> +	if (this_cpu_inc_return(tracing_irq_cpu) == 1) {
>  		tracer_hardirqs_off(CALLER_ADDR0, CALLER_ADDR1);
>  		trace(irq_disable, TP_ARGS(CALLER_ADDR0, CALLER_ADDR1));
>  	}

So what about lovely things like:

trace_hardirqs_on_prepare()
  if (tracing_irq_cpu) {
    tracer_hardirqs_on();
    <NMI>
      trace_hardirqs_off_finish();
        if (this_cpu_inc_return() == 1) // will be > 1


So now we've traced IRQs are on, start an NMI, and loose the IRQs off
event. Well done.


This was all safe in that it would occasionally emit a duplicate state,
but no state was wrong. Both your attempts have broken things.

How about you fix you tool to accept duplicate state in the face of NMI
instead?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ