linux-kernel - Re: [PATCH] ring-buffer: Use a housekeeping CPU to wake up waiters

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20260108114615.02cd0c7c@mordecai>
Date: Thu, 8 Jan 2026 11:46:15 +0100
From: Petr Tesarik <ptesarik@...e.com>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: Masami Hiramatsu <mhiramat@...nel.org>, Mathieu Desnoyers
 <mathieu.desnoyers@...icios.com>, Sebastian Andrzej Siewior
 <bigeasy@...utronix.de>, Clark Williams <clrkwllms@...nel.org>,
 linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
 linux-rt-devel@...ts.linux.dev, Frederic Weisbecker <frederic@...nel.org>
Subject: Re: [PATCH] ring-buffer: Use a housekeeping CPU to wake up waiters

On Thu, 8 Jan 2026 09:39:32 +0100
Petr Tesarik <ptesarik@...e.com> wrote:

> On Wed, 7 Jan 2026 11:19:35 -0500
> Steven Rostedt <rostedt@...dmis.org> wrote:
> 
> > On Wed, 7 Jan 2026 11:17:09 -0500
> > Steven Rostedt <rostedt@...dmis.org> wrote:
> >   
> > > Or we simply change it to:
> > > 
> > > static inline void    
> > 
> > Actually, the above should be noinline, as it's in a slower path, and
> > should not be adding logic into the cache of the fast path.  
> 
> However, to be honest, I'm surprized this is considered slow path. My
> use case is to record a few selected trace events with "trace-cmd
> record", which spends most time polling trace_pipe_raw. Consequently,
> there is almost always a pending waiter that requires a wakeup.
> 
> In short, irq_work_queue() is the hot path for me.
> 
> OTOH I don't mind making it noinline, because on recent Intel and AMD
> systems, a function call (noinline) is often cheaper than an increase
> in L1 cache footprint (caused by inlining). But I'm confused. I have
> always thought most people use tracing same way as I do.
> 
> > > rb_irq_work_queue(struct rb_irq_work *irq_work)
> > > {
> > > 	int cpu;
> > > 
> > > 	/* irq_work_queue_on() is not allowed in NMI context */
> > > 	if (in_nmi()) {
> > > 		irq_work_queue(&irq_work->work, cpu);
> > > 		return;
> > > 	}  
> 
> Thanks for the idea. There are some downsides. IIUC there is no
> fundamental reason IPIs to other CPUs cannot be sent from NMI context.
> It's just a limitation of the current Linux kernel code. As such, it
> may be lifted in the future, and at that point nobody will remember to
> remove this condition.
> 
> My current plan is it to keep the patch on hold and have a look why IPI
> backends are not NMI-safe. In fact, I'm not even 100% sure the comment
> is correct. The issue may have fixed itself e.g. by removing the last
> affected architecture. ;-)

This turned to be an interesting digression. Since we still support
old xAPIC (not x2APIC) systems, there is a reason in hardware. The xAPIC
ICR is programmed by writing to two 32-bit registers. If an NMI occurs
between those two writes, we'd have to restore the upper 32 bits of
ICR. Alternatively, we could queue the request if ICR write is in
progress and flush the queue after finishing the write to the ICR (out
of NMI context). The code could even be arch-independent...

However, it's not worth the effort just for this one corner case.
Besides, it seems that other people have always been aware that
irq_work_queue_on() is NMI-unsafe, so in case the future brings a
better reason to make it NMI-safe, there's a fair chance that all
the extra code in rb_irq_work_queue() gets reviewed.

Petr T