linux-kernel - Re: [PATCH v8 1/9] sched/fair: fix unfairness at wakeup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <32f4a76d-103e-510f-de70-ba9dfe2356ce@arm.com>
Date:   Mon, 14 Nov 2022 20:13:47 +0100
From:   Dietmar Eggemann <dietmar.eggemann@....com>
To:     Vincent Guittot <vincent.guittot@...aro.org>, mingo@...hat.com,
        peterz@...radead.org, juri.lelli@...hat.com, rostedt@...dmis.org,
        bsegall@...gle.com, mgorman@...e.de, bristot@...hat.com,
        vschneid@...hat.com, linux-kernel@...r.kernel.org,
        parth@...ux.ibm.com
Cc:     qyousef@...alina.io, chris.hyser@...cle.com,
        patrick.bellasi@...bug.net, David.Laight@...lab.com,
        pjt@...gle.com, pavel@....cz, tj@...nel.org, qperret@...gle.com,
        tim.c.chen@...ux.intel.com, joshdon@...gle.com, timj@....org,
        kprateek.nayak@....com, yu.c.chen@...el.com,
        youssefesmat@...omium.org, joel@...lfernandes.org
Subject: Re: [PATCH v8 1/9] sched/fair: fix unfairness at wakeup

On 10/11/2022 18:50, Vincent Guittot wrote:
> At wake up, the vruntime of a task is updated to not be more older than
> a sched_latency period behind the min_vruntime. This prevents long sleeping
> task to get unlimited credit at wakeup.
> Such waking task should preempt current one to use its CPU bandwidth but
> wakeup_gran() can be larger than sched_latency, filter out the
> wakeup preemption and as a results steals some CPU bandwidth to
> the waking task.
> 
> Make sure that a task, which vruntime has been capped, will preempt current
> task and use its CPU bandwidth even if wakeup_gran() is in the same range
> as sched_latency.

Looks like that gran can be nuch higher than sched_latency for extreme
cases?

> 
> If the waking task failed to preempt current it could to wait up to
> sysctl_sched_min_granularity before preempting it during next tick.
> 
> Strictly speaking, we should use cfs->min_vruntime instead of
> curr->vruntime but it doesn't worth the additional overhead and complexity
> as the vruntime of current should be close to min_vruntime if not equal.

^^^ Does this related to the `if (vdiff > gran) return 1` condition in
wakeup_preempt_entity()?

[...]

> @@ -7187,6 +7171,18 @@ wakeup_preempt_entity(struct sched_entity *curr, struct sched_entity *se)
>  		return -1;
>  
>  	gran = wakeup_gran(se);
> +
> +	/*
> +	 * At wake up, the vruntime of a task is capped to not be older than
> +	 * a sched_latency period compared to min_vruntime. This prevents long
> +	 * sleeping task to get unlimited credit at wakeup. Such waking up task
> +	 * has to preempt current in order to not lose its share of CPU
> +	 * bandwidth but wakeup_gran() can become higher than scheduling period
> +	 * for low priority task. Make sure that long sleeping task will get a

low priority task or taskgroup with low cpu.shares, right?

6 CPUs

sysctl_sched
  .sysctl_sched_latency              : 18.000000
  .sysctl_sched_min_granularity      : 2.250000
  .sysctl_sched_idle_min_granularity : 0.750000
  .sysctl_sched_wakeup_granularity   : 3.000000
  ...

p1 & p2 affine to CPUX

     '/'
     /\
   p1  p2

p1 & p2	nice=0	      - vdiff=9ms gran=3ms lat_max=6.75ms
p1 & p2	nice=4	      - vdiff=9ms gran=7.26ms lat_max=6.75ms
p1 & p2	nice=19	      - vdiff=9ms gran=204.79ms lat_max=6.75ms


     '/'
     /\
    A  B
   /    \
  p1    p2

A & B cpu.shares=1024 - vdiff=9ms gran=3ms lat_max=6.75ms
A & B cpu.shares=448  - vdiff=9ms gran=6.86ms lat_max=6.75ms
A & B cpu.shares=2    - vdiff=9ms gran=1536ms lat_max=6.75ms

> +	 * chance to preempt current.
> +	 */
> +	gran = min_t(s64, gran, get_latency_max());
> +

[...]

> @@ -2448,6 +2448,34 @@ extern unsigned int sysctl_numa_balancing_scan_period_max;
>  extern unsigned int sysctl_numa_balancing_scan_size;
>  #endif
>  
> +static inline unsigned long  get_sched_latency(bool idle)
                              ^^
2 white-spaces

[...]

> +
> +static inline unsigned long  get_latency_max(void)
                              ^^

[...]