linux-kernel - Re: [PATCH 08/17] sched/fair: Implement an EEVDF like policy

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230329080251.GK4253@hirez.programming.kicks-ass.net>
Date:   Wed, 29 Mar 2023 10:02:51 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Josh Don <joshdon@...gle.com>
Cc:     mingo@...nel.org, vincent.guittot@...aro.org,
        linux-kernel@...r.kernel.org, juri.lelli@...hat.com,
        dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
        mgorman@...e.de, bristot@...hat.com, corbet@....net,
        qyousef@...alina.io, chris.hyser@...cle.com,
        patrick.bellasi@...bug.net, pjt@...gle.com, pavel@....cz,
        qperret@...gle.com, tim.c.chen@...ux.intel.com, timj@....org,
        kprateek.nayak@....com, yu.c.chen@...el.com,
        youssefesmat@...omium.org, joel@...lfernandes.org, efault@....de
Subject: Re: [PATCH 08/17] sched/fair: Implement an EEVDF like policy

On Tue, Mar 28, 2023 at 06:26:51PM -0700, Josh Don wrote:
> Hi Peter,
> 
> This is a really interesting proposal and in general I think the
> incorporation of latency/deadline is quite a nice enhancement. We've
> struggled for a while to get better latency bounds on performance
> sensitive threads in the face of antagonism from overcommit.
> 
> >  void update_entity_lag(struct cfs_rq *cfs_rq, struct sched_entity *se)
> >  {
> > +       s64 lag, limit;
> > +
> >         SCHED_WARN_ON(!se->on_rq);
> > -       se->vlag = avg_vruntime(cfs_rq) - se->vruntime;
> > +       lag = avg_vruntime(cfs_rq) - se->vruntime;
> > +
> > +       limit = calc_delta_fair(max_t(u64, 2*se->slice, TICK_NSEC), se);
> > +       se->vlag = clamp(lag, -limit, limit);
> 
> This is for dequeue; presumably you'd want to update the vlag at
> enqueue in case the average has moved again due to enqueue/dequeue of
> other entities?

Ha, just adding the entry back will shift the avgerage around and it's
all a giant pain in the backside.

 place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
 {
 	u64 vruntime = avg_vruntime(cfs_rq);
+	s64 lag = 0;
 
+	/*
+	 * Due to how V is constructed as the weighted average of entities,
+	 * adding tasks with positive lag, or removing tasks with negative lag
+	 * will move 'time' backwards, this can screw around with the lag of
+	 * other tasks.
+	 *
+	 * EEVDF: placement strategy #1 / #2
+	 */
+	if (sched_feat(PLACE_LAG) && cfs_rq->nr_running > 1) {
+		struct sched_entity *curr = cfs_rq->curr;
+		unsigned long load;
 
+		lag = se->vlag;
 
 		/*
+		 * If we want to place a task and preserve lag, we have to
+		 * consider the effect of the new entity on the weighted
+		 * average and compensate for this, otherwise lag can quickly
+		 * evaporate:
+		 *
+		 * l_i = V - v_i <=> v_i = V - l_i
+		 *
+		 * V = v_avg = W*v_avg / W
+		 *
+		 * V' = (W*v_avg + w_i*v_i) / (W + w_i)
+		 *    = (W*v_avg + w_i(v_avg - l_i)) / (W + w_i)
+		 *    = v_avg + w_i*l_i/(W + w_i)
+		 *
+		 * l_i' = V' - v_i = v_avg + w_i*l_i/(W + w_i) - (v_avg - l)
+		 *      = l_i - w_i*l_i/(W + w_i)
+		 *
+		 * l_i = (W + w_i) * l_i' / W
 		 */
+		load = cfs_rq->avg_load;
+		if (curr && curr->on_rq)
+			load += curr->load.weight;
+
+		lag *= load + se->load.weight;
+		if (WARN_ON_ONCE(!load))
+			load = 1;
+		lag = div_s64(lag, load);
 
+		vruntime -= lag;
 	}


That ^ is the other side of it.

But yes, once enqueued, additional join/leave operations can/will shift
V around and lag changes, nothing much to do about that.

The paper does it all a wee bit differently, but I think it ends up
being the same. They explicitly track V (and shift it around on
join/leave) while I implicitly track it through the average and then
need to play games like the above, but in the end it should be the same.