linux-kernel - Re: [PATCH 03/15] sched/fair: Add lag based placement

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <xm26fs2fhcu7.fsf@bsegall-linux.svl.corp.google.com>
Date:   Thu, 12 Oct 2023 12:15:12 -0700
From:   Benjamin Segall <bsegall@...gle.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     mingo@...nel.org, vincent.guittot@...aro.org,
        linux-kernel@...r.kernel.org, juri.lelli@...hat.com,
        dietmar.eggemann@....com, rostedt@...dmis.org, mgorman@...e.de,
        bristot@...hat.com, corbet@....net, qyousef@...alina.io,
        chris.hyser@...cle.com, patrick.bellasi@...bug.net, pjt@...gle.com,
        pavel@....cz, qperret@...gle.com, tim.c.chen@...ux.intel.com,
        joshdon@...gle.com, timj@....org, kprateek.nayak@....com,
        yu.c.chen@...el.com, youssefesmat@...omium.org,
        joel@...lfernandes.org, efault@....de, tglx@...utronix.de
Subject: Re: [PATCH 03/15] sched/fair: Add lag based placement

Peter Zijlstra <peterz@...radead.org> writes:

> @@ -4853,49 +4872,119 @@ static void
>  place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
>  {
>  	u64 vruntime = avg_vruntime(cfs_rq);
> +	s64 lag = 0;
>  
> -	/* sleeps up to a single latency don't count. */
> -	if (!initial) {
> -		unsigned long thresh;
> +	/*
> +	 * Due to how V is constructed as the weighted average of entities,
> +	 * adding tasks with positive lag, or removing tasks with negative lag
> +	 * will move 'time' backwards, this can screw around with the lag of
> +	 * other tasks.
> +	 *
> +	 * EEVDF: placement strategy #1 / #2
> +	 */

So the big problem with EEVDF #1 compared to #2/#3 and CFS (hacky though
it is) is that it creates a significant perverse incentive to yield or
spin until you see yourself be preempted, rather than just sleep (if you
have any competition on the cpu). If you go to sleep immediately after
doing work and happen to do so near the end of a slice (arguably what
you _want_ to have happen overall), then you have to pay that negative
lag in wakeup latency later, because it is maintained through any amount
of sleep. (#1 or similar is good for reweight/migrate of course)

#2 in theory could be abused by micro-sleeping right before you are
preempted, but that isn't something tasks can really predict, unlike
seeing more "don't go to sleep, just spin, the latency numbers are so
much better" nonsense.