linux-kernel - Re: [RFT] sched_ext: Skip stack trace capture in NMI context

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20251224091846.3851271a@gandalf.local.home>
Date: Wed, 24 Dec 2025 09:18:46 -0500
From: Steven Rostedt <rostedt@...dmis.org>
To: Joel Fernandes <joelagnelf@...dia.com>
Cc: Tejun Heo <tj@...nel.org>, "linux-kernel@...r.kernel.org"
 <linux-kernel@...r.kernel.org>, David Vernet <void@...ifault.com>, Andrea
 Righi <arighi@...dia.com>, Changwoo Min <changwoo@...lia.com>, Ingo Molnar
 <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, Juri Lelli
 <juri.lelli@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>,
 Dietmar Eggemann <dietmar.eggemann@....com>, Ben Segall
 <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, Valentin Schneider
 <vschneid@...hat.com>, "sched-ext@...ts.linux.dev"
 <sched-ext@...ts.linux.dev>
Subject: Re: [RFT] sched_ext: Skip stack trace capture in NMI context

On Tue, 23 Dec 2025 18:58:33 -0500
Joel Fernandes <joelagnelf@...dia.com> wrote:

> Some documentation suggests IST is required for reliable NMI stack tracing
> [1] [2] which 32-bit does not have.
> ”If an interrupt or other exception is taken while the stack or other unwind
> state is in an inconsistent state, it may not be possible to reliably unwind,
> and it may not be possible to identify whether such unwinding will be
> reliable. See below for examples.“
> 
> Probably the issue happens to be more of printing garbage than crashing the
> kernel, but I am not convinced it is stable. Hmm.

Correct. It's about reliable stack traces, as live kernel patching requires
that the stack it looks at is reliable before it can modify the code. What
happens if it's not reliable, means it will just stop at the interrupt
handler and you don't get to see the rest (or you'll see a bunch of
functions with "?" in front of them).

-- Steve