[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9D417792-A0A8-44C4-884B-A3406D2E7A1D@nvidia.com>
Date: Tue, 23 Dec 2025 04:34:00 +0000
From: Joel Fernandes <joelagnelf@...dia.com>
To: Tejun Heo <tj@...nel.org>
CC: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, David
Vernet <void@...ifault.com>, Andrea Righi <arighi@...dia.com>, Changwoo Min
<changwoo@...lia.com>, Ingo Molnar <mingo@...hat.com>, Peter Zijlstra
<peterz@...radead.org>, Juri Lelli <juri.lelli@...hat.com>, Vincent Guittot
<vincent.guittot@...aro.org>, Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel
Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
"sched-ext@...ts.linux.dev" <sched-ext@...ts.linux.dev>
Subject: Re: [RFT] sched_ext: Skip stack trace capture in NMI context
> On Dec 22, 2025, at 9:44 PM, Tejun Heo <tj@...nel.org> wrote:
>
> Hello,
>
>> On Mon, Dec 22, 2025 at 07:50:37PM -0500, Joel Fernandes wrote:
>> stack_trace_save() is not guaranteed to be NMI-safe on all
>> architectures.
>>
>> The hardlockup detector calls into sched_ext via the following call
>> chain when an NMI occurs:
>>
>> watchdog_overflow_callback()
>> watchdog_hardlockup_check()
>> scx_hardlockup()
>> stack_trace_save()
>>
>> Skip stack trace capture when in_nmi() returns true to prevent
>> potential deadlocks.
>>
>> Fixes: 582f700e1bdc ("sched_ext: Hook up hardlockup detector")
>> Signed-off-by: Joel Fernandes <joelagnelf@...dia.com>
>
> This does work on x86 (right?) and is useful in understanding what the
> underlying problem is. It'd be great if there's a config flag we can test
> but if not can we specifically exclude archs which are known to not work?
You are right that we will miss out on architectures where this is safe. We should make it more specific. I am wondering if Steven Rostedt has any thoughts here since he is actively working on stack tracing/unwinding and has made similar commits in the past where he restricted stack tracing in an NMI context.
Per my understanding, stack trace unwinding is not safe/valid to do on architectures where the NMI context does not have its own stack. But I could stand corrected, hence I marked this as an RFT. It is safe to do on 64-bit x86, but not on 32-bit x86 and other same-stack architectures.
If we feel that this is not an issue, then that is fine with me (and sorry for the noise), but I just wanted to raise it anyway just in case. Sooner or later someone running scx on an odd architecture might complaint.
Thanks!
- Joel
>
> Thanks.
>
> --
> tejun
Powered by blists - more mailing lists