linux-kernel - Re: [PATCH] task_work: Consume only item at a time while invoking the callbacks.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250225163549.GB29585@redhat.com>
Date: Tue, 25 Feb 2025 17:35:50 +0100
From: Oleg Nesterov <oleg@...hat.com>
To: Frederic Weisbecker <frederic@...nel.org>
Cc: Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
	linux-kernel@...r.kernel.org, Eric Dumazet <edumazet@...gle.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Waiman Long <longman@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH] task_work: Consume only item at a time while invoking
 the callbacks.

On 02/25, Frederic Weisbecker wrote:
>
> Le Sun, Feb 23, 2025 at 11:40:15PM +0100, Oleg Nesterov a écrit :
> >
> > I'll try to find and read the previous discussions tomorrow, but iirc Frederic
> > had another solution?
>
> This:
>
> 	https://lore.kernel.org/all/Zx-B0wK3xqRQsCOS@localhost.localdomain/
>
> Though I'm not entirely happy with it either.

Yes, thanks...

Can we do something else and avoid this rcuwait_wait_event() altogether?

To simplify the discussion, suppose we add a global XXX_LOCK. Just in
case, of course we shouldn't do this ;) But lets suppose we do.

Now, can _something_ like the (incomplete, ugly as hell) patch below work?

Oleg.
---

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5304,12 +5304,12 @@ static void perf_pending_task_sync(struct perf_event *event)
 		return;
 	}
 
-	/*
-	 * All accesses related to the event are within the same RCU section in
-	 * perf_pending_task(). The RCU grace period before the event is freed
-	 * will make sure all those accesses are complete by then.
-	 */
-	rcuwait_wait_event(&event->pending_work_wait, !event->pending_work, TASK_UNINTERRUPTIBLE);
+	spin_lock(XXX_LOCK);
+	if (event->pending_work) {
+		local_dec(&event->ctx->nr_no_switch_fast);
+		event->pending_work = -1;
+	}
+	spin_unlock(XXX_LOCK);
 }
 
 static void _free_event(struct perf_event *event)
@@ -5369,7 +5369,15 @@ static void _free_event(struct perf_event *event)
 	exclusive_event_destroy(event);
 	module_put(event->pmu->module);
 
-	call_rcu(&event->rcu_head, free_event_rcu);
+	bool free = true;
+	spin_lock(XXX_LOCK)
+	if (event->pending_work == -1) {
+		event->pending_work = -2;
+		free = false;
+	}
+	spin_unlock(XXX_LOCK);
+	if (free)
+		call_rcu(&event->rcu_head, free_event_rcu);
 }
 
 /*
@@ -6981,7 +6989,14 @@ static void perf_pending_task(struct callback_head *head)
 {
 	struct perf_event *event = container_of(head, struct perf_event, pending_task);
 	int rctx;
+	bool free = false;
 
+	spin_lock(XXX_LOCK);
+	if ((int)event->pending_work < 0) {
+		free = event->pending_work == -2u;
+		event->pending_work = 0;
+		goto unlock;
+	}
 	/*
 	 * All accesses to the event must belong to the same implicit RCU read-side
 	 * critical section as the ->pending_work reset. See comment in
@@ -7004,6 +7019,12 @@ static void perf_pending_task(struct callback_head *head)
 
 	if (rctx >= 0)
 		perf_swevent_put_recursion_context(rctx);
+
+unlock:
+	spin_unlock(XXX_LOCK);
+
+	if (free)
+		call_rcu(&event->rcu_head, free_event_rcu);
 }
 
 #ifdef CONFIG_GUEST_PERF_EVENTS