[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20251223235833.GA1273740@joelbox2>
Date: Tue, 23 Dec 2025 18:58:33 -0500
From: Joel Fernandes <joelagnelf@...dia.com>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: Tejun Heo <tj@...nel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
David Vernet <void@...ifault.com>, Andrea Righi <arighi@...dia.com>,
Changwoo Min <changwoo@...lia.com>, Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Valentin Schneider <vschneid@...hat.com>,
"sched-ext@...ts.linux.dev" <sched-ext@...ts.linux.dev>
Subject: Re: [RFT] sched_ext: Skip stack trace capture in NMI context
On Tue, Dec 23, 2025 at 03:31:36PM -0500, Steven Rostedt wrote:
> On Tue, 23 Dec 2025 04:34:00 +0000
> Joel Fernandes <joelagnelf@...dia.com> wrote:
>
> > > This does work on x86 (right?) and is useful in understanding what the
> > > underlying problem is. It'd be great if there's a config flag we can test
> > > but if not can we specifically exclude archs which are known to not work?
> >
> > You are right that we will miss out on architectures where this is safe.
> > We should make it more specific. I am wondering if Steven Rostedt has any
> > thoughts here since he is actively working on stack tracing/unwinding and
> > has made similar commits in the past where he restricted stack tracing in
> > an NMI context.
>
> [ Fixes line wrap, ug it's hard to read emails that go across 300 characters! ]
Sorry about that. Thank you.
> Well, we do kernel stack tracing in NMI context all the time with no issue
> (but I mostly work on x86).
>
> >
> > Per my understanding, stack trace unwinding is not safe/valid to do on
> > architectures where the NMI context does not have its own stack. But I
>
> Hmm, no, I think it's fine to do it on archs where NMI doesn't have its own
> stack. It works on 32bit x86, where the NMI shares the kernel stack.
>
> Which architecture had an issue with a stack trace?
On 32 bit what happens if NMI hits during stack frame setup? Can the unwinder
misbehave if base pointer has not yet been setup and NMI starts using same
stack?
Not sure.
Some documentation suggests IST is required for reliable NMI stack tracing
[1] [2] which 32-bit does not have.
”If an interrupt or other exception is taken while the stack or other unwind
state is in an inconsistent state, it may not be possible to reliably unwind,
and it may not be possible to identify whether such unwinding will be
reliable. See below for examples.“
Probably the issue happens to be more of printing garbage than crashing the
kernel, but I am not convinced it is stable. Hmm.
[1] https://www.kernel.org/doc/html/v6.16/arch/x86/kernel-stacks.html
[2] https://docs.kernel.org/livepatch/reliable-stacktrace.html
thanks,
- Joel
>
> -- Steve
>
>
> > could stand corrected, hence I marked this as an RFT. It is safe to do
> > on 64-bit x86, but not on 32-bit x86 and other same-stack architectures.
> >
> > If we feel that this is not an issue, then that is fine with me (and
> > sorry for the noise), but I just wanted to raise it anyway just in case.
> > Sooner or later someone running scx on an odd architecture might
> > complaint.
>
>
Powered by blists - more mailing lists