[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231120105606.GQ8262@noisy.programming.kicks-ass.net>
Date:   Mon, 20 Nov 2023 11:56:06 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     Abel Wu <wuyun.abel@...edance.com>
Cc:     Tobias Huschle <huschle@...ux.ibm.com>,
        Linux Kernel <linux-kernel@...r.kernel.org>,
        kvm@...r.kernel.org, virtualization@...ts.linux.dev,
        netdev@...r.kernel.org, mst@...hat.com, jasowang@...hat.com
Subject: Re: Re: EEVDF/vhost regression (bisected to 86bfbb7ce4f6 sched/fair:
 Add lag based placement)
On Sat, Nov 18, 2023 at 01:14:32PM +0800, Abel Wu wrote:
> Hi Peter, I'm a little confused here. As we adopt placement strategy #1
> when PLACE_LAG is enabled, the lag of that entity needs to be preserved.
> Given that the weight doesn't change, we have:
> 
> 	vl' = vl
> 
> But in fact it is scaled on placement:
> 
> 	vl' = vl * W/(W + w)
(W+w)/W
> 
> Does this intended? 
The scaling, yes that's intended and the comment explains why. So now
you have me confused too :-)
Specifically, I want the lag after placement to be equal to the lag we
come in with. Since placement will affect avg_vruntime (adding one
element to the average changes the average etc..) the placement also
affects the lag as measured after placement.
Or rather, if you enqueue and dequeue, I want the lag to be preserved.
If you do not take placement into consideration, lag will dissipate real
quick.
> And to illustrate my understanding of strategy #1:
> @@ -5162,41 +5165,17 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
>  		 * vl_i is given by:
>  		 *
>  		 *   V' = (\Sum w_j*v_j + w_i*v_i) / (W + w_i)
> -		 *      = (W*V + w_i*(V - vl_i)) / (W + w_i)
> -		 *      = (W*V + w_i*V - w_i*vl_i) / (W + w_i)
> -		 *      = (V*(W + w_i) - w_i*l) / (W + w_i)
> -		 *      = V - w_i*vl_i / (W + w_i)
> -		 *
> -		 * And the actual lag after adding an entity with vl_i is:
> -		 *
> -		 *   vl'_i = V' - v_i
> -		 *         = V - w_i*vl_i / (W + w_i) - (V - vl_i)
> -		 *         = vl_i - w_i*vl_i / (W + w_i)
> -		 *
> -		 * Which is strictly less than vl_i. So in order to preserve lag
> -		 * we should inflate the lag before placement such that the
> -		 * effective lag after placement comes out right.
> -		 *
> -		 * As such, invert the above relation for vl'_i to get the vl_i
> -		 * we need to use such that the lag after placement is the lag
> -		 * we computed before dequeue.
> +		 *      = (W*V + w_i*(V' - vl_i)) / (W + w_i)
> +		 *      = V - w_i*vl_i / W
>  		 *
> -		 *   vl'_i = vl_i - w_i*vl_i / (W + w_i)
> -		 *         = ((W + w_i)*vl_i - w_i*vl_i) / (W + w_i)
> -		 *
> -		 *   (W + w_i)*vl'_i = (W + w_i)*vl_i - w_i*vl_i
> -		 *                   = W*vl_i
> -		 *
> -		 *   vl_i = (W + w_i)*vl'_i / W
>  		 */
>  		load = cfs_rq->avg_load;
>  		if (curr && curr->on_rq)
>  			load += scale_load_down(curr->load.weight);
> -
> -		lag *= load + scale_load_down(se->load.weight);
>  		if (WARN_ON_ONCE(!load))
>  			load = 1;
> -		lag = div_s64(lag, load);
> +
> +		vruntime -= div_s64(lag * scale_load_down(se->load.weight), load);
>  	}
>  	se->vruntime = vruntime - lag;
So you're proposing we do:
	v = V - (lag * w) / (W + w) - lag
?
That can be written like:
	v = V - (lag * w) / (W+w) - (lag * (W+w)) / (W+w)
	  = V - (lag * (W+w) + lag * w) / (W+w)
	  = V - (lag * (W+2w)) / (W+w)
And that turns into a mess AFAICT.
Let me repeat my earlier argument. Suppose v,w,l are the new element.
V,W are the old avg_vruntime and sum-weight.
Then: V = V*W / W, and by extention: V' = (V*W + v*w) / (W + w).
The new lag, after placement: 
l' = V' - v = (V*W + v*w) / (W+w) - v
            = (V*W + v*w) / (W+w) - v * (W+w) / (W+v)
	    = (V*W + v*w -v*W - v*w) / (W+w)
	    = (V*W - v*W) / (W+w)
	    = W*(V-v) / (W+w)
	    = W/(W+w) * (V-v)
Substitute: v = V - (W+w)/W * l, my scaling thing, to obtain:
l' = W/(W+w) * (V - (V - (W+w)/W * l))
   = W/(W+w) * (V - V + (W+w)/W * l)
   = W/(W+w) * (W+w)/W * l
   = l
So by scaling, we've preserved lag across placement.
That make sense?
Powered by blists - more mailing lists
 
