lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20251020204421.GA197647@joelbox2>
Date: Mon, 20 Oct 2025 16:44:21 -0400
From: Joel Fernandes <joelagnelf@...dia.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Lyude Paul <lyude@...hat.com>, rust-for-linux@...r.kernel.org,
	Thomas Gleixner <tglx@...utronix.de>,
	Boqun Feng <boqun.feng@...il.com>, linux-kernel@...r.kernel.org,
	Daniel Almeida <daniel.almeida@...labora.com>,
	Danilo Krummrich <dakr@...nel.org>,
	Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
	Vlastimil Babka <vbabka@...e.cz>,
	"Liam R. Howlett" <Liam.Howlett@...cle.com>,
	Uladzislau Rezki <urezki@...il.com>,
	Miguel Ojeda <ojeda@...nel.org>,
	Alex Gaynor <alex.gaynor@...il.com>, Gary Guo <gary@...yguo.net>,
	Bj??rn Roy Baron <bjorn3_gh@...tonmail.com>,
	Benno Lossin <lossin@...nel.org>,
	Andreas Hindborg <a.hindborg@...nel.org>,
	Alice Ryhl <aliceryhl@...gle.com>, Trevor Gross <tmgross@...ch.edu>,
	"Rafael J. Wysocki" <rafael@...nel.org>,
	Viresh Kumar <viresh.kumar@...aro.org>,
	Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
	Ingo Molnar <mingo@...nel.org>, Ryo Takakura <ryotkkr98@...il.com>,
	K Prateek Nayak <kprateek.nayak@....com>,
	"open list:CPU FREQUENCY SCALING FRAMEWORK" <linux-pm@...r.kernel.org>
Subject: Re: [PATCH v13 01/17] preempt: Track NMI nesting to separate per-CPU
 counter

On Tue, Oct 14, 2025 at 09:43:49PM +0200, Peter Zijlstra wrote:
> On Tue, Oct 14, 2025 at 01:55:47PM -0400, Joel Fernandes wrote:
> > 
> > 
> > On 10/14/2025 6:48 AM, Peter Zijlstra wrote:
> > > On Mon, Oct 13, 2025 at 11:48:03AM -0400, Lyude Paul wrote:
> > > 
> > >>  #define __nmi_enter()						\
> > >>  	do {							\
> > >>  		lockdep_off();					\
> > >>  		arch_nmi_enter();				\
> > >> -		BUG_ON(in_nmi() == NMI_MASK);			\
> > >> -		__preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET);	\
> > >> +		BUG_ON(__this_cpu_read(nmi_nesting) == UINT_MAX);	\
> > >> +		__this_cpu_inc(nmi_nesting);			\
> > > 
> > > An NMI that nests from here..
> > > 
> > >> +		__preempt_count_add(HARDIRQ_OFFSET);		\
> > >> +		if (__this_cpu_read(nmi_nesting) == 1)		\
> > > 
> > > .. until here, will see nmi_nesting > 1 and not set NMI_OFFSET.
> > 
> > This is true, I can cure it by setting NMI_OFFSET unconditionally when
> > nmi_nesting >= 1. Then the outer most NMI will then reset it. I think that will
> > work. Do you see any other issue with doing so?
> 
> unconditionally set NMI_FFSET, regardless of nmi_nesting
> and only clear on exit when nmi_nesting == 0.
> 
> Notably, when you use u64 __preempt_count, you can limit this to 32bit
> only. The NMI nesting can happen in the single instruction window
> between ADD and ADC. But on 64bit you don't have that gap and so don't
> need to fix it.

Wouldn't this break __preempt_count_dec_and_test though? If we make it
64-bit, then there is no longer a way on x86 32-bit to decrement the preempt
count and zero-test the entire word in the same instruction (decl). And I
feel there might be other races as well. Also this means that every
preempt_disable/enable will be heavier on 32-bit.

If we take the approach of this patch, but move the per-cpu counter to cache
hot area, what are the other drawbacks other than few more instructions on
NMI entry/exit? It feels simpler and less risky. But let me know if I missed
something.

thanks,

 - Joel


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ