[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250918173220.GA3475922@noisy.programming.kicks-ass.net>
Date: Thu, 18 Sep 2025 19:32:20 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: Steven Rostedt <rostedt@...nel.org>, linux-kernel@...r.kernel.org,
linux-trace-kernel@...r.kernel.org, bpf@...r.kernel.org,
x86@...nel.org, Masami Hiramatsu <mhiramat@...nel.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Josh Poimboeuf <jpoimboe@...nel.org>,
Ingo Molnar <mingo@...nel.org>, Jiri Olsa <jolsa@...nel.org>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Namhyung Kim <namhyung@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Andrii Nakryiko <andrii@...nel.org>,
Indu Bhagat <indu.bhagat@...cle.com>,
"Jose E. Marchesi" <jemarch@....org>,
Beau Belgrave <beaub@...ux.microsoft.com>,
Jens Remus <jremus@...ux.ibm.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Florian Weimer <fweimer@...hat.com>, Sam James <sam@...too.org>,
Kees Cook <kees@...nel.org>, Carlos O'Donell <codonell@...hat.com>
Subject: Re: [RESEND][PATCH v15 0/4] perf: Support the deferred unwinding
infrastructure
On Thu, Sep 18, 2025 at 07:24:14PM +0200, Peter Zijlstra wrote:
> So we have:
>
> do_syscall_64()
> ... do stuff ...
> syscall_exit_to_user_mode(regs)
> syscall_exit_to_user_mode_work(regs)
> syscall_exit_work()
> exit_to_user_mode_prepare()
> exit_to_user_mode_loop()
> retume_user_mode_work()
> task_work_run()
> exit_to_user_mode()
> unwind_reset_info();
> user_enter_irqoff();
> arch_exit_to_user_mode();
> lockdep_hardirqs_on();
> SYSRET/IRET
>
>
> and
>
> DEFINE_IDTENTRY*()
> irqentry_enter();
> ... stuff ...
> irqentry_exit()
> irqentry_exit_to_user_mode()
> exit_to_user_mode_prepare()
> exit_to_user_mode_loop();
> retume_user_mode_work()
> task_work_run()
> exit_to_user_mode()
> unwind_reset_info();
> ...
> IRET
>
> Now, task_work_run() is in the exit_to_user_mode_loop() which is notably
> *before* exit_to_user_mode() which does the unwind_reset_info().
>
> What happens if we get an NMI requesting an unwind after
> unwind_reset_info() while still very much being in the kernel on the way
> out?
AFAICT it will try and do a task_work_add(TWA_RESUME) from NMI context,
and this will fail horribly.
If you do something like:
twa_mode = in_nmi() ? TWA_NMI_CURRENT : TWA_RESUME;
task_work_add(foo, twa_mode);
it might actually work.
Powered by blists - more mailing lists