[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z5fYI0ftjpJSoJKV@gpd3>
Date: Mon, 27 Jan 2025 20:01:55 +0100
From: Andrea Righi <arighi@...dia.com>
To: Tejun Heo <tj@...nel.org>
Cc: David Vernet <void@...ifault.com>, Changwoo Min <changwoo@...lia.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4] sched_ext: Fix lock imbalance in
dispatch_to_local_dsq()
On Mon, Jan 27, 2025 at 08:59:12AM -1000, Tejun Heo wrote:
> Hello,
>
> On Sat, Jan 25, 2025 at 10:16:57AM +0100, Andrea Righi wrote:
> ...
> > @@ -2253,9 +2253,11 @@ static void move_local_task_to_local_dsq(struct task_struct *p, u64 enq_flags,
> > * @dst_rq: rq to move the task into, locked on return
> > *
> > * Move @p which is currently on @src_rq to @dst_rq's local DSQ.
> > + *
> > + * Return the rq where @p has been moved.
> > */
> > -static void move_remote_task_to_local_dsq(struct task_struct *p, u64 enq_flags,
> > - struct rq *src_rq, struct rq *dst_rq)
> > +static struct rq *move_remote_task_to_local_dsq(struct task_struct *p, u64 enq_flags,
> > + struct rq *src_rq, struct rq *dst_rq)
> > {
> > lockdep_assert_rq_held(src_rq);
> >
> > @@ -2277,6 +2279,8 @@ static void move_remote_task_to_local_dsq(struct task_struct *p, u64 enq_flags,
> > dst_rq->scx.extra_enq_flags = enq_flags;
> > activate_task(dst_rq, p, 0);
> > dst_rq->scx.extra_enq_flags = 0;
> > +
> > + return dst_rq;
>
> The returned dst_rq always matches the input param, right? Let's please not
> do this. The return value can mislead users to assume that the returned
> value may be different from the input. e.g. kobj_get() does similar identity
> return for convenience and that led people to assume that kobj_get() does
> zero-ref testing before inc'ing which led to a group of bugs. Just follow up
> the call statement with an explicit assignment if necessary. Nothing
> meaningful is achieved by merging that into the call statement.
>
> > +/**
> > + * dispatch_to_local_dsq - Dispatch a task to a local dsq
> > + * @rq: current rq which is locked
> > + * @dst_dsq: destination DSQ
> > + * @p: task to dispatch
> > + * @enq_flags: %SCX_ENQ_*
> > + *
> > + * We're holding @rq lock and want to dispatch @p to @dst_dsq which is a local
> > + * DSQ. This function performs all the synchronization dancing needed because
> > + * local DSQs are protected with rq locks.
> > + *
> > + * The caller must have exclusive ownership of @p (e.g. through
> > + * %SCX_OPSS_DISPATCHING).
> > + */
> > +static void dispatch_to_local_dsq(struct rq *rq, struct scx_dispatch_q *dst_dsq,
> > + struct task_struct *p, u64 enq_flags)
> > +{
> > + struct rq *src_rq = task_rq(p);
> > + struct rq *dst_rq = container_of(dst_dsq, struct rq, scx.local_dsq);
> > +
> > + /*
> > + * We're synchronized against dequeue through DISPATCHING. As @p can't
> > + * be dequeued, its task_rq and cpus_allowed are stable too.
> > + *
> > + * If dispatching to @rq that @p is already on, no lock dancing needed.
> > + */
> > + if (rq == src_rq && rq == dst_rq) {
> > + dispatch_enqueue(dst_dsq, p, enq_flags | SCX_ENQ_CLEAR_OPSS);
> > + return;
> > + }
> > +
> > + move_to_remote_dsq(rq, src_rq, dst_rq, p, enq_flags);
>
> I'm not sure the refactoring is adding much, but even if it does:
>
> - All the changes the patch makes and why should be explained in the
> description.
>
> - The refactoring is obscuring the actual fix. If refactoring is desirable,
> please put it in a separate patch.
Ok, makes sense, I'll send a new patch without the refactoring.
Thanks,
-Andrea
Powered by blists - more mailing lists