[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <01ad0644-4639-4217-ac31-b04777658d18@nvidia.com>
Date: Thu, 30 Oct 2025 18:56:56 -0400
From: Joel Fernandes <joelagnelf@...dia.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Lyude Paul <lyude@...hat.com>, rust-for-linux@...r.kernel.org,
Thomas Gleixner <tglx@...utronix.de>, Boqun Feng <boqun.feng@...il.com>,
linux-kernel@...r.kernel.org, Daniel Almeida <daniel.almeida@...labora.com>,
Danilo Krummrich <dakr@...nel.org>,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
Vlastimil Babka <vbabka@...e.cz>, "Liam R. Howlett"
<Liam.Howlett@...cle.com>, Uladzislau Rezki <urezki@...il.com>,
Miguel Ojeda <ojeda@...nel.org>, Alex Gaynor <alex.gaynor@...il.com>,
Gary Guo <gary@...yguo.net>, Bj??rn Roy Baron <bjorn3_gh@...tonmail.com>,
Benno Lossin <lossin@...nel.org>, Andreas Hindborg <a.hindborg@...nel.org>,
Alice Ryhl <aliceryhl@...gle.com>, Trevor Gross <tmgross@...ch.edu>,
"Rafael J. Wysocki" <rafael@...nel.org>,
Viresh Kumar <viresh.kumar@...aro.org>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
Ingo Molnar <mingo@...nel.org>, Ryo Takakura <ryotkkr98@...il.com>,
K Prateek Nayak <kprateek.nayak@....com>,
"open list:CPU FREQUENCY SCALING FRAMEWORK" <linux-pm@...r.kernel.org>
Subject: Re: [PATCH v13 01/17] preempt: Track NMI nesting to separate per-CPU
counter
On 10/20/2025 4:44 PM, Joel Fernandes wrote:
> On Tue, Oct 14, 2025 at 09:43:49PM +0200, Peter Zijlstra wrote:
>> On Tue, Oct 14, 2025 at 01:55:47PM -0400, Joel Fernandes wrote:
>>>
>>>
>>> On 10/14/2025 6:48 AM, Peter Zijlstra wrote:
>>>> On Mon, Oct 13, 2025 at 11:48:03AM -0400, Lyude Paul wrote:
>>>>
>>>>> #define __nmi_enter() \
>>>>> do { \
>>>>> lockdep_off(); \
>>>>> arch_nmi_enter(); \
>>>>> - BUG_ON(in_nmi() == NMI_MASK); \
>>>>> - __preempt_count_add(NMI_OFFSET + HARDIRQ_OFFSET); \
>>>>> + BUG_ON(__this_cpu_read(nmi_nesting) == UINT_MAX); \
>>>>> + __this_cpu_inc(nmi_nesting); \
>>>>
>>>> An NMI that nests from here..
>>>>
>>>>> + __preempt_count_add(HARDIRQ_OFFSET); \
>>>>> + if (__this_cpu_read(nmi_nesting) == 1) \
>>>>
>>>> .. until here, will see nmi_nesting > 1 and not set NMI_OFFSET.
>>>
>>> This is true, I can cure it by setting NMI_OFFSET unconditionally when
>>> nmi_nesting >= 1. Then the outer most NMI will then reset it. I think that will
>>> work. Do you see any other issue with doing so?
>>
>> unconditionally set NMI_FFSET, regardless of nmi_nesting
>> and only clear on exit when nmi_nesting == 0.
>>
>> Notably, when you use u64 __preempt_count, you can limit this to 32bit
>> only. The NMI nesting can happen in the single instruction window
>> between ADD and ADC. But on 64bit you don't have that gap and so don't
>> need to fix it.
>
> Wouldn't this break __preempt_count_dec_and_test though? If we make it
> 64-bit, then there is no longer a way on x86 32-bit to decrement the preempt
> count and zero-test the entire word in the same instruction (decl). And I
> feel there might be other races as well. Also this means that every
> preempt_disable/enable will be heavier on 32-bit.
>
> If we take the approach of this patch, but move the per-cpu counter to cache
> hot area, what are the other drawbacks other than few more instructions on
> NMI entry/exit? It feels simpler and less risky. But let me know if I missed
> something.
>
If its Ok, for the next revision, I will just do the following to cure the issue
Peter found, and respin the patch. Let me know any objections. Thanks.
diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index 177eed1de35c..cc06bda52c3e 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -113,8 +113,7 @@ void irq_exit_rcu(void);
BUG_ON(__this_cpu_read(nmi_nesting) == UINT_MAX); \
__this_cpu_inc(nmi_nesting); \
__preempt_count_add(HARDIRQ_OFFSET); \
- if (__this_cpu_read(nmi_nesting) == 1) \
- __preempt_count_add(NMI_OFFSET); \
+ preempt_count_set(preempt_count() | NMI_MASK); \
} while (0)
#define nmi_enter()
Powered by blists - more mailing lists