[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20250221170530.L3yMvO0i@linutronix.de>
Date: Fri, 21 Feb 2025 18:05:30 +0100
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: linux-kernel@...r.kernel.org
Cc: Eric Dumazet <edumazet@...gle.com>,
Frederic Weisbecker <frederic@...nel.org>,
Oleg Nesterov <oleg@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Waiman Long <longman@...hat.com>,
Thomas Gleixner <tglx@...utronix.de>
Subject: [PATCH] task_work: Consume only item at a time while invoking the
callbacks.
Yi and syzbot managed to hang the task within task_run().
The problem is
task_work_run() -> __fput() -> perf_release() ->
perf_event_release_kernel() -> _free_event() ->
perf_pending_task_sync() -> task_work_cancel() failed ->
rcuwait_wait_event().
Once task_work_run() is running, the list of callbacks removed from the
task_struct and from this point on task_work_cancel() can't remove any
pending and not yet started work items.
The proposed solution is to remove one item at a time.
Oleg pointed out that this might be problematic if one closes 2.000.000
files at once. While testing this scenario by opening that many files
following by exit() to ensure that all files are closed at once, I did
not observe anything outside of noise.
Consume only one work item at a time. Forward the next work item under
the lock to ensure it is not canceled during the replacement.
Cc: Eric Dumazet <edumazet@...gle.com>
Cc: Frederic Weisbecker <frederic@...nel.org>
Cc: Oleg Nesterov <oleg@...hat.com>
Reported-by: "Yi Lai" <yi1.lai@...ux.intel.com>
Closes: https://lore.kernel.org/all/Zx9Losv4YcJowaP%2F@ly-workstation/
Reported-by: syzbot+3c4321e10eea460eb606@...kaller.appspotmail.com
Closes: https://lore.kernel.org/all/673adf75.050a0220.87769.0024.GAE@google.com/
Fixes: c5d93d23a2601 ("perf: Enqueue SIGTRAP always via task_work.")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
---
kernel/task_work.c | 35 ++++++++++++++++++-----------------
1 file changed, 18 insertions(+), 17 deletions(-)
diff --git a/kernel/task_work.c b/kernel/task_work.c
index d1efec571a4a4..7d6c71e9a00a9 100644
--- a/kernel/task_work.c
+++ b/kernel/task_work.c
@@ -194,7 +194,7 @@ bool task_work_cancel(struct task_struct *task, struct callback_head *cb)
void task_work_run(void)
{
struct task_struct *task = current;
- struct callback_head *work, *head, *next;
+ struct callback_head *work, *head;
for (;;) {
/*
@@ -202,17 +202,7 @@ void task_work_run(void)
* work_exited unless the list is empty.
*/
work = READ_ONCE(task->task_works);
- do {
- head = NULL;
- if (!work) {
- if (task->flags & PF_EXITING)
- head = &work_exited;
- else
- break;
- }
- } while (!try_cmpxchg(&task->task_works, &work, head));
-
- if (!work)
+ if (!work && !(task->flags & PF_EXITING))
break;
/*
* Synchronize with task_work_cancel_match(). It can not remove
@@ -220,13 +210,24 @@ void task_work_run(void)
* But it can remove another entry from the ->next list.
*/
raw_spin_lock_irq(&task->pi_lock);
+ do {
+ head = NULL;
+ if (work) {
+ head = READ_ONCE(work->next);
+ } else {
+ if (task->flags & PF_EXITING)
+ head = &work_exited;
+ else
+ break;
+ }
+ } while (!try_cmpxchg(&task->task_works, &work, head));
raw_spin_unlock_irq(&task->pi_lock);
- do {
- next = work->next;
- work->func(work);
- work = next;
+ if (!work)
+ break;
+ work->func(work);
+
+ if (head)
cond_resched();
- } while (work);
}
}
--
2.47.2
Powered by blists - more mailing lists