[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250122124228.GO7145@noisy.programming.kicks-ass.net>
Date: Wed, 22 Jan 2025 13:42:28 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Josh Poimboeuf <jpoimboe@...nel.org>
Cc: x86@...nel.org, Steven Rostedt <rostedt@...dmis.org>,
Ingo Molnar <mingo@...nel.org>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
linux-kernel@...r.kernel.org, Indu Bhagat <indu.bhagat@...cle.com>,
Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...nel.org>, Namhyung Kim <namhyung@...nel.org>,
Ian Rogers <irogers@...gle.com>,
Adrian Hunter <adrian.hunter@...el.com>,
linux-perf-users@...r.kernel.org, Mark Brown <broonie@...nel.org>,
linux-toolchains@...r.kernel.org, Jordan Rome <jordalgo@...a.com>,
Sam James <sam@...too.org>, linux-trace-kernel@...r.kernel.org,
Andrii Nakryiko <andrii.nakryiko@...il.com>,
Jens Remus <jremus@...ux.ibm.com>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Florian Weimer <fweimer@...hat.com>,
Andy Lutomirski <luto@...nel.org>,
Masami Hiramatsu <mhiramat@...nel.org>,
Weinan Liu <wnliu@...gle.com>
Subject: Re: [PATCH v4 02/39] task_work: Fix TWA_NMI_CURRENT race with
__schedule()
On Tue, Jan 21, 2025 at 06:30:54PM -0800, Josh Poimboeuf wrote:
> If TWA_NMI_CURRENT task work is queued from an NMI triggered while
> running in __schedule() with IRQs disabled, task_work_set_notify_irq()
> ends up inadvertently running on the next scheduled task. So the
> original task doesn't get its TIF_NOTIFY_RESUME flag set and the task
> work may get delayed indefinitely, or may not get to run at all.
>
> __schedule()
> // disable irqs
> <NMI>
> task_work_add(current, work, TWA_NMI_CURRENT);
> </NMI>
> // current = next;
> // enable irqs
> <IRQ>
> task_work_set_notify_irq()
> test_and_set_tsk_thread_flag(current, TIF_NOTIFY_RESUME); // wrong task!
> </IRQ>
> // original task skips task work on its next return to user (or exit!)
>
> Fix it by storing the task pointer along with the irq_work struct and
> passing that task to set_notify_resume().
So I'm a little confused, isn't something like this sufficient?
If we hit before schedule(), all just works as expected, if we hit after
schedule(), the task will already have the TIF flag set, and we'll hit
the return to user path once it gets scheduled again.
---
diff --git a/kernel/task_work.c b/kernel/task_work.c
index c969f1f26be5..155549c017b2 100644
--- a/kernel/task_work.c
+++ b/kernel/task_work.c
@@ -9,7 +9,12 @@ static struct callback_head work_exited; /* all we need is ->next == NULL */
#ifdef CONFIG_IRQ_WORK
static void task_work_set_notify_irq(struct irq_work *entry)
{
- test_and_set_tsk_thread_flag(current, TIF_NOTIFY_RESUME);
+ /*
+ * no-op IPI
+ *
+ * TWA_NMI_CURRENT will already have set the TIF flag, all
+ * this interrupt does it tickle the return-to-user path.
+ */
}
static DEFINE_PER_CPU(struct irq_work, irq_work_NMI_resume) =
IRQ_WORK_INIT_HARD(task_work_set_notify_irq);
@@ -98,6 +103,7 @@ int task_work_add(struct task_struct *task, struct callback_head *work,
break;
#ifdef CONFIG_IRQ_WORK
case TWA_NMI_CURRENT:
+ set_tsk_thread_flag(current, TIF_NOTIFY_RESUME);
irq_work_queue(this_cpu_ptr(&irq_work_NMI_resume));
break;
#endif
Powered by blists - more mailing lists