lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZJQQXQ/+p4f5FcAd@BLR-5CG11610CF.amd.com>
Date:   Thu, 22 Jun 2023 14:41:57 +0530
From:   "Gautham R. Shenoy" <gautham.shenoy@....com>
To:     David Vernet <void@...ifault.com>
Cc:     linux-kernel@...r.kernel.org, mingo@...hat.com,
        peterz@...radead.org, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, rostedt@...dmis.org,
        dietmar.eggemann@....com, bsegall@...gle.com, mgorman@...e.de,
        bristot@...hat.com, vschneid@...hat.com, joshdon@...gle.com,
        roman.gushchin@...ux.dev, tj@...nel.org, kernel-team@...a.com
Subject: Re: [RFC PATCH 3/3] sched: Implement shared wakequeue in CFS

On Wed, Jun 21, 2023 at 08:43:29PM -0500, David Vernet wrote:
> On Wed, Jun 21, 2023 at 01:47:00PM +0530, Gautham R. Shenoy wrote:
> > Hello David,
> > On Tue, Jun 20, 2023 at 03:08:22PM -0500, David Vernet wrote:
> > > On Mon, Jun 19, 2023 at 11:43:13AM +0530, Gautham R. Shenoy wrote:
> > > > Hello David,
> > > > 
> > > > 
> > > > On Tue, Jun 13, 2023 at 12:20:04AM -0500, David Vernet wrote:
> > > > [..snip..]
> > > > 
> > > > > +static void swqueue_enqueue(struct rq *rq, struct task_struct *p, int enq_flags)
> > > > > +{
> > > > > +	unsigned long flags;
> > > > > +	struct swqueue *swqueue;
> > > > > +	bool task_migrated = enq_flags & ENQUEUE_MIGRATED;
> > > > > +	bool task_wakeup = enq_flags & ENQUEUE_WAKEUP;
> > > > > +
> > > > > +	/*
> > > > > +	 * Only enqueue the task in the shared wakequeue if:
> > > > > +	 *
> > > > > +	 * - SWQUEUE is enabled
> > > > > +	 * - The task is on the wakeup path
> > > > > +	 * - The task wasn't purposefully migrated to the current rq by
> > > > > +	 *   select_task_rq()
> > > > > +	 * - The task isn't pinned to a specific CPU
> > > > > +	 */
> > > > > +	if (!task_wakeup || task_migrated || p->nr_cpus_allowed == 1)
> > > > > +		return;
> > > > 
> > > > In select_task_rq_fair(), having determined if the target of task
> > > > wakeup should be the task's previous CPU vs the waker's current CPU,
> > > > we spend quite a bit of time already to determine if there is an idle
> > > > core/CPU in the target's LLC. @rq would correspond to CPU chosen as a
> > > > result of that scan or if no idle CPU exists, @rq corresponds to the
> > > > target CPU determined by wake_affine_idle()/wake_affine_weight().
> > > > 
> > > > So if the CPU of @rq is idle here, can we not simply return here?
> > > 
> > > Hi Gautum,
> > > 
> > > Sorry, I'm not sure I'm quite following the issue you're pointing out.
> > > We don't use swqueue if the task was migrated following
> > > select_task_rq_fair(). That's the idea with us returning if the task was
> > > migrated (the second conditional in that if). If I messed up that logic
> > > somehow, it should be fixed.
> > 
> > Sorry, my bad. I see it now.
> > 
> > So as per this patch, the only time we enqueue the task on the shared
> > wakeup is if the target of try_to_wake_up() is the same CPU where the
> > task ran previously.
> > 
> > When wake_affine logic fails and the previous CPU is chosen as the
> > target, and when there are no other idle cores/threads in the LLC of
> > the previous CPU, it makes sense to queue the task on the
> > shared-wakequeue instead of on a busy previous CPU.
> > 
> > And when that previous CPU is idle, the try_to_wake_up() would have
> > woken it up via ttwu_queue(), so before going idle the next time it
> > will check the shared queue for the task and find it. We should be
> > good in this case.
> > 
> > Now, it is possible that select_task_rq_fair() ended up selecting the
> > waker's CPU as the target based on the
> > wake_affine_idle()/wake_affine_weight() logic. And if there is no idle
> > core/thread on the waker's LLC, the target would be the busy waker
> > CPU. In the case when the waker CPU is different from the task's
> > previous CPU, due to ENQUEUE_MIGRATE flag being set, the task won't be
> > queued on the shared wakequeue and instead has to wait on the busy
> > waker CPU.
> > 
> > I wonder if it makes sense to enqueue the task on the shared wakequeue
> > in this scenario as well.
> 
> Hello Gautham,
> 
> That's a good point. My original intention with opting out of using
> swqueue if select_task_rq_fair() caused us to migrate is so that it
> wouldn't interfere with decisions made with other select_task_rq_fair()
> heuristics like wake_affine_*(). Basically just minimizing the possible
> impact of swqueue.
> That said, I think it probably does make sense to
> just enqueue in the swqueue regardless of whether ENQUEUE_MIGRATED is
> set. One of the main goals of swqueue is work conservation, and in
> hindsight it does feel somewhat artificial to add a heuristic that works
> against that.

In that case we can perhaps have an explicit flag that is passed by
try_to_wake_up() when it cannot find an idle CPU and the chosen target
is just a fallback. The task gets enqueued on the swqueue of the
target only in such cases. Something like the following entirely
untested:

---------------------------- x8----------------------------
diff --git a/include/linux/sched.h b/include/linux/sched.h
index b64fec55a381..38005262a7fe 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -910,6 +910,12 @@ struct task_struct {
 	 */
 	unsigned			sched_remote_wakeup:1;
 
+	/*
+	 * Bit used by select_idle_sibling() to signal enqueuing the
+	 * task on a shared wakequeue.
+	 */
+	unsigned			sched_add_to_swq:1;
+
 	/* Bit to tell LSMs we're in execve(): */
 	unsigned			in_execve:1;
 	unsigned			in_iowait:1;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e311d1c3b992..f4246c33f3c5 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -215,21 +215,17 @@ static void swqueue_enqueue(struct rq *rq, struct task_struct *p, int enq_flags)
 {
 	unsigned long flags;
 	struct swqueue *swqueue;
-	bool task_migrated = enq_flags & ENQUEUE_MIGRATED;
-	bool task_wakeup = enq_flags & ENQUEUE_WAKEUP;
 
 	/*
 	 * Only enqueue the task in the shared wakequeue if:
 	 *
 	 * - SWQUEUE is enabled
-	 * - The task is on the wakeup path
-	 * - The task wasn't purposefully migrated to the current rq by
-	 *   select_task_rq()
-	 * - The task isn't pinned to a specific CPU
+	 * - select_idle_sibling() didn't find an idle CPU to enqueue this wakee task.
 	 */
-	if (!task_wakeup || task_migrated || p->nr_cpus_allowed == 1)
+	if (!READ_ONCE(p->sched_add_to_swq))
 		return;
 
+	WRITE_ONCE(p->sched_add_to_swq, 0);
 	swqueue = rq_swqueue(rq);
 	spin_lock_irqsave(&swqueue->lock, flags);
 	list_add_tail(&p->swqueue_node, &swqueue->list);
@@ -7361,6 +7357,11 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
 	if ((unsigned)i < nr_cpumask_bits)
 		return i;
 
+	/*
+	 * No idle sibling was found. Ok to queue this task on the
+	 * shared wakequeue of the target.
+	 */
+	WRITE_ONCE(p->sched_add_to_swq, 1);
 	return target;
 }
 

> 
> I'd like to hear what others think. In my opinion it's worth at least
> running some tests on workloads that heavily utilize the CPU such as
> kernel compile, and seeing what the outcomes are.

I will try and get some numbers for such workloads on our setup over
this weekend.

> 
> Thanks,
> David

--
Thanks and Regards
gautham.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ