linux-kernel - Re: [PATCH] task_work: Consume only item at a time while invoking the callbacks.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Z78oRqnN9-NZO_LJ@localhost.localdomain>
Date: Wed, 26 Feb 2025 15:42:14 +0100
From: Frederic Weisbecker <frederic@...nel.org>
To: Oleg Nesterov <oleg@...hat.com>
Cc: Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
	linux-kernel@...r.kernel.org, Eric Dumazet <edumazet@...gle.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Waiman Long <longman@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH] task_work: Consume only item at a time while invoking
 the callbacks.

Le Wed, Feb 26, 2025 at 03:01:15PM +0100, Oleg Nesterov a écrit :
> On 02/25, Frederic Weisbecker wrote:
> >
> > Le Tue, Feb 25, 2025 at 05:35:50PM +0100, Oleg Nesterov a écrit :
> > > --- a/kernel/events/core.c
> > > +++ b/kernel/events/core.c
> > > @@ -5304,12 +5304,12 @@ static void perf_pending_task_sync(struct perf_event *event)
> > >  		return;
> > >  	}
> > >
> > > -	/*
> > > -	 * All accesses related to the event are within the same RCU section in
> > > -	 * perf_pending_task(). The RCU grace period before the event is freed
> > > -	 * will make sure all those accesses are complete by then.
> > > -	 */
> > > -	rcuwait_wait_event(&event->pending_work_wait, !event->pending_work, TASK_UNINTERRUPTIBLE);
> > > +	spin_lock(XXX_LOCK);
> > > +	if (event->pending_work) {
> > > +		local_dec(&event->ctx->nr_no_switch_fast);
> > > +		event->pending_work = -1;
> > > +	}
> > > +	spin_unlock(XXX_LOCK);
> > >  }
> > >
> > >  static void _free_event(struct perf_event *event)
> > > @@ -5369,7 +5369,15 @@ static void _free_event(struct perf_event *event)
> > >  	exclusive_event_destroy(event);
> > >  	module_put(event->pmu->module);
> > >
> > > -	call_rcu(&event->rcu_head, free_event_rcu);
> > > +	bool free = true;
> > > +	spin_lock(XXX_LOCK)
> > > +	if (event->pending_work == -1) {
> > > +		event->pending_work = -2;
> > > +		free = false;
> > > +	}
> > > +	spin_unlock(XXX_LOCK);
> > > +	if (free)
> > > +		call_rcu(&event->rcu_head, free_event_rcu);
> > >  }
> > >
> > >  /*
> > > @@ -6981,7 +6989,14 @@ static void perf_pending_task(struct callback_head *head)
> > >  {
> > >  	struct perf_event *event = container_of(head, struct perf_event, pending_task);
> > >  	int rctx;
> > > +	bool free = false;
> > >
> > > +	spin_lock(XXX_LOCK);
> > > +	if ((int)event->pending_work < 0) {
> > > +		free = event->pending_work == -2u;
> > > +		event->pending_work = 0;
> > > +		goto unlock;
> > > +	}
> > >  	/*
> > >  	 * All accesses to the event must belong to the same implicit RCU read-side
> > >  	 * critical section as the ->pending_work reset. See comment in
> > > @@ -7004,6 +7019,12 @@ static void perf_pending_task(struct callback_head *head)
> > >
> > >  	if (rctx >= 0)
> > >  		perf_swevent_put_recursion_context(rctx);
> > > +
> > > +unlock:
> > > +	spin_unlock(XXX_LOCK);
> > > +
> > > +	if (free)
> > > +		call_rcu(&event->rcu_head, free_event_rcu);
> > >  }
> > >
> > >  #ifdef CONFIG_GUEST_PERF_EVENTS
> > >
> >
> > Heh, I suggested something similar also:
> > https://lore.kernel.org/lkml/ZyJUzhzHGDu5CLdi@localhost.localdomain/
> 
> ;)
> 
> I can't comment your patch because I don't understand this code enough.
> 
> My patch is more simple, it doesn't play with refcount.
> 
> perf_pending_task_sync() sets ->pending_work = -1, after that
> perf_pending_task() (which can run in parallel on another CPU) will
> only clear ->pending_work and do nothing else.
> 
> Then _free_event() rechecks ->pending_work before return, if it is still
> nonzero then perf_pending_task() is still pending. In this case _free_event()
> sets ->pending_work = -2 to offload call_rcu(free_event_rcu) to the pending
> perf_pending_task().

Right it works but it does a parallel implementation of events refcounting.

> 
> But it is certainly more ugly, and perhaps the very idea is wrong. So I will
> be happy if we go with your patch.

Ok I'll prepare a changelog and see where it goes.

> Either way, IMO we should try to kill this rcuwait_wait_event() logic. See
> another email I sent a minute ago in this thread. Quite possibly I missed
> something, but the very idea to wait for another task doesn't look safe
> to me.

You're right it's very fragile...

Thanks.

> 
> Thanks!
> 
> Oleg.
>