[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z5fXgG5963z2TzEF@slm.duckdns.org>
Date: Mon, 27 Jan 2025 08:59:12 -1000
From: Tejun Heo <tj@...nel.org>
To: Andrea Righi <arighi@...dia.com>
Cc: David Vernet <void@...ifault.com>, Changwoo Min <changwoo@...lia.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4] sched_ext: Fix lock imbalance in
dispatch_to_local_dsq()
Hello,
On Sat, Jan 25, 2025 at 10:16:57AM +0100, Andrea Righi wrote:
...
> @@ -2253,9 +2253,11 @@ static void move_local_task_to_local_dsq(struct task_struct *p, u64 enq_flags,
> * @dst_rq: rq to move the task into, locked on return
> *
> * Move @p which is currently on @src_rq to @dst_rq's local DSQ.
> + *
> + * Return the rq where @p has been moved.
> */
> -static void move_remote_task_to_local_dsq(struct task_struct *p, u64 enq_flags,
> - struct rq *src_rq, struct rq *dst_rq)
> +static struct rq *move_remote_task_to_local_dsq(struct task_struct *p, u64 enq_flags,
> + struct rq *src_rq, struct rq *dst_rq)
> {
> lockdep_assert_rq_held(src_rq);
>
> @@ -2277,6 +2279,8 @@ static void move_remote_task_to_local_dsq(struct task_struct *p, u64 enq_flags,
> dst_rq->scx.extra_enq_flags = enq_flags;
> activate_task(dst_rq, p, 0);
> dst_rq->scx.extra_enq_flags = 0;
> +
> + return dst_rq;
The returned dst_rq always matches the input param, right? Let's please not
do this. The return value can mislead users to assume that the returned
value may be different from the input. e.g. kobj_get() does similar identity
return for convenience and that led people to assume that kobj_get() does
zero-ref testing before inc'ing which led to a group of bugs. Just follow up
the call statement with an explicit assignment if necessary. Nothing
meaningful is achieved by merging that into the call statement.
> +/**
> + * dispatch_to_local_dsq - Dispatch a task to a local dsq
> + * @rq: current rq which is locked
> + * @dst_dsq: destination DSQ
> + * @p: task to dispatch
> + * @enq_flags: %SCX_ENQ_*
> + *
> + * We're holding @rq lock and want to dispatch @p to @dst_dsq which is a local
> + * DSQ. This function performs all the synchronization dancing needed because
> + * local DSQs are protected with rq locks.
> + *
> + * The caller must have exclusive ownership of @p (e.g. through
> + * %SCX_OPSS_DISPATCHING).
> + */
> +static void dispatch_to_local_dsq(struct rq *rq, struct scx_dispatch_q *dst_dsq,
> + struct task_struct *p, u64 enq_flags)
> +{
> + struct rq *src_rq = task_rq(p);
> + struct rq *dst_rq = container_of(dst_dsq, struct rq, scx.local_dsq);
> +
> + /*
> + * We're synchronized against dequeue through DISPATCHING. As @p can't
> + * be dequeued, its task_rq and cpus_allowed are stable too.
> + *
> + * If dispatching to @rq that @p is already on, no lock dancing needed.
> + */
> + if (rq == src_rq && rq == dst_rq) {
> + dispatch_enqueue(dst_dsq, p, enq_flags | SCX_ENQ_CLEAR_OPSS);
> + return;
> + }
> +
> + move_to_remote_dsq(rq, src_rq, dst_rq, p, enq_flags);
I'm not sure the refactoring is adding much, but even if it does:
- All the changes the patch makes and why should be explained in the
description.
- The refactoring is obscuring the actual fix. If refactoring is desirable,
please put it in a separate patch.
Thanks.
--
tejun
Powered by blists - more mailing lists