lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260108114615.02cd0c7c@mordecai>
Date: Thu, 8 Jan 2026 11:46:15 +0100
From: Petr Tesarik <ptesarik@...e.com>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: Masami Hiramatsu <mhiramat@...nel.org>, Mathieu Desnoyers
 <mathieu.desnoyers@...icios.com>, Sebastian Andrzej Siewior
 <bigeasy@...utronix.de>, Clark Williams <clrkwllms@...nel.org>,
 linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
 linux-rt-devel@...ts.linux.dev, Frederic Weisbecker <frederic@...nel.org>
Subject: Re: [PATCH] ring-buffer: Use a housekeeping CPU to wake up waiters

On Thu, 8 Jan 2026 09:39:32 +0100
Petr Tesarik <ptesarik@...e.com> wrote:

> On Wed, 7 Jan 2026 11:19:35 -0500
> Steven Rostedt <rostedt@...dmis.org> wrote:
> 
> > On Wed, 7 Jan 2026 11:17:09 -0500
> > Steven Rostedt <rostedt@...dmis.org> wrote:
> >   
> > > Or we simply change it to:
> > > 
> > > static inline void    
> > 
> > Actually, the above should be noinline, as it's in a slower path, and
> > should not be adding logic into the cache of the fast path.  
> 
> However, to be honest, I'm surprized this is considered slow path. My
> use case is to record a few selected trace events with "trace-cmd
> record", which spends most time polling trace_pipe_raw. Consequently,
> there is almost always a pending waiter that requires a wakeup.
> 
> In short, irq_work_queue() is the hot path for me.
> 
> OTOH I don't mind making it noinline, because on recent Intel and AMD
> systems, a function call (noinline) is often cheaper than an increase
> in L1 cache footprint (caused by inlining). But I'm confused. I have
> always thought most people use tracing same way as I do.
> 
> > > rb_irq_work_queue(struct rb_irq_work *irq_work)
> > > {
> > > 	int cpu;
> > > 
> > > 	/* irq_work_queue_on() is not allowed in NMI context */
> > > 	if (in_nmi()) {
> > > 		irq_work_queue(&irq_work->work, cpu);
> > > 		return;
> > > 	}  
> 
> Thanks for the idea. There are some downsides. IIUC there is no
> fundamental reason IPIs to other CPUs cannot be sent from NMI context.
> It's just a limitation of the current Linux kernel code. As such, it
> may be lifted in the future, and at that point nobody will remember to
> remove this condition.
> 
> My current plan is it to keep the patch on hold and have a look why IPI
> backends are not NMI-safe. In fact, I'm not even 100% sure the comment
> is correct. The issue may have fixed itself e.g. by removing the last
> affected architecture. ;-)

This turned to be an interesting digression. Since we still support
old xAPIC (not x2APIC) systems, there is a reason in hardware. The xAPIC
ICR is programmed by writing to two 32-bit registers. If an NMI occurs
between those two writes, we'd have to restore the upper 32 bits of
ICR. Alternatively, we could queue the request if ICR write is in
progress and flush the queue after finishing the write to the ICR (out
of NMI context). The code could even be arch-independent...

However, it's not worth the effort just for this one corner case.
Besides, it seems that other people have always been aware that
irq_work_queue_on() is NMI-unsafe, so in case the future brings a
better reason to make it NMI-safe, there's a fair chance that all
the extra code in rb_irq_work_queue() gets reviewed.

Petr T

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ