lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 30 May 2024 00:50:36 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Luis Machado <luis.machado@....com>
Cc: mingo@...hat.com, juri.lelli@...hat.com, vincent.guittot@...aro.org,
	dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
	mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com,
	linux-kernel@...r.kernel.org, kprateek.nayak@....com,
	wuyun.abel@...edance.com, tglx@...utronix.de, efault@....de,
	nd <nd@....com>, John Stultz <jstultz@...gle.com>,
	Hongyan.Xia2@....com
Subject: Re: [RFC][PATCH 08/10] sched/fair: Implement delayed dequeue

On Thu, May 02, 2024 at 11:26:14AM +0100, Luis Machado wrote:

> Going through the code, my understanding is that the util_est functions seem to be getting
> called correctly, and in the right order. That is, we first util_est_enqueue, then util_est_dequeue
> and finally util_est_update. So the stats *should* be correct.
> 
> On dequeuing (dequeue_task_fair), we immediately call util_est_dequeue, even for the case of
> a DEQUEUE_DELAYED task, since we're no longer going to run the dequeue_delayed task for now, even
> though it is still in the rq.
> 
> We delay the util_est_update of dequeue_delayed tasks until a later time in dequeue_entities.
> 
> Eventually the dequeue_delayed task will have its lag zeroed when it becomes eligible again,
> (requeue_delayed_entity) while still being in the rq. It will then get dequeued/enqueued (requeued),
> and marked as a non-dequeue-delayed task.
> 
> Next time we attempt to enqueue such a task (enqueue_task_fair), it will skip the ENQUEUE_DELAYED
> block and call util_est_enqueue.
> 
> Still, something seems to be signalling that util/load is high, and causing migration to the big cores.
> 
> Maybe we're not decaying the util/load properly at some point, and inflated numbers start to happen.
> 
> I'll continue investigating.

So I re-read all this util_est stuff again this evening and I am
confused :-) Please bear with me.

So the old code does:

	dequeue_task_fair()
	  util_est_dequeue();
	  // actual dequeue -- updates pelt
	  util_est_update();


While the new code does:

	dequeue_task_fair()
	  util_est_dequeue();
	  if (!dequeue())
	    return;
	  util_est_update();

	delayed_dequeue:
	  util_est_update();


Specifically, we only call util_est_update() if/when we do the actual
dequeue -- potentially at a later point in time. Because, as I argued in
the comments, ttwu()'s ENQUEUE_DELAYED will not actually enqueue the
task (since it is still enqueued) and therefore the update would be
spurious.

However!!, if we do dequeue, then we'll end up updating the EWMA with a
potentially different task_util(p).

And this is where the confusion comes... this extra time on the runqueue
would not be running and thus decreate util_avg, as such task_util_est()
should be lower than before and tasks should tend towards smaller cores,
rather than larger cores as you seem to be seeing.

[ there is the 'dequeued + UTIL_EST_MARGIN < task_runnable()' escape
  clause, which we might be hitting... dunno ]

In any case, if you haven't tried it already, could I ask you to test
the below (once you're back in the office)?

Also, this doesn't really explain why things go sideways once you enable
DELAY_DEQUEUE and then don't re-align if you disable it again. I mean,
it should eventually age out the dodgy EWMA contributions and start
working as expected.

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7008,7 +7008,6 @@ static int dequeue_entities(struct rq *r
 			SCHED_WARN_ON(p->on_rq != 1);
 
 			/* Fix-up what dequeue_task_fair() skipped */
-			util_est_update(&rq->cfs, p, task_sleep);
 			hrtick_update(rq);
 
 			/* Fix-up what block_task() skipped. */
@@ -7028,13 +7027,11 @@ static bool dequeue_task_fair(struct rq
 {
 	util_est_dequeue(&rq->cfs, p);
 
-	if (dequeue_entities(rq, &p->se, flags) < 0)
+	if (dequeue_entities(rq, &p->se, flags) < 0) {
+		util_est_update(&rq->cfs, p, DEQUEUE_SLEEP);
 		return false;
+	}
 
-	/*
-	 * It doesn't make sense to update util_est for the delayed dequeue
-	 * case where ttwu will make it appear the sleep never happened.
-	 */
 	util_est_update(&rq->cfs, p, flags & DEQUEUE_SLEEP);
 	hrtick_update(rq);
 	return true;

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ