linux-kernel - Re: [PATCH] sched/fair: Always update

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1878eaa4-0d2b-933a-12c8-40b28d771251@amd.com>
Date:   Mon, 16 Oct 2023 11:09:11 +0530
From:   K Prateek Nayak <kprateek.nayak@....com>
To:     Daniel Jordan <daniel.m.jordan@...cle.com>
Cc:     peterz@...radead.org, bristot@...hat.com, bsegall@...gle.com,
        chris.hyser@...cle.com, corbet@....net, dietmar.eggemann@....com,
        efault@....de, joel@...lfernandes.org, joshdon@...gle.com,
        juri.lelli@...hat.com, linux-kernel@...r.kernel.org,
        mgorman@...e.de, mingo@...nel.org, patrick.bellasi@...bug.net,
        pavel@....cz, pjt@...gle.com, qperret@...gle.com,
        qyousef@...alina.io, rostedt@...dmis.org, tglx@...utronix.de,
        tim.c.chen@...ux.intel.com, timj@....org,
        vincent.guittot@...aro.org, youssefesmat@...omium.org,
        yu.c.chen@...el.com
Subject: Re: [PATCH] sched/fair: Always update_curr() before placing at
 enqueue

Hello Daniel,

I see a good and consistent improvement in Stream (with shorter loops)
with this change. Everything else is more or less the same.

I'll leave the detailed results below.

On 10/6/2023 10:18 PM, Daniel Jordan wrote:
> Placing wants current's vruntime and the cfs_rq's min_vruntime up to
> date so that avg_runtime() is too, and similarly it wants the entity to
> be re-weighted and lag adjusted so vslice and vlag are fresh, so always
> do update_curr() and update_cfs_group() beforehand.
> 
> There doesn't seem to be a reason to treat the 'curr' case specially
> after e8f331bcc270 since vruntime doesn't get normalized anymore.
> 
> Fixes: e8f331bcc270 ("sched/smp: Use lag to simplify cross-runqueue placement")
> Signed-off-by: Daniel Jordan <daniel.m.jordan@...cle.com>

o Machine details

- 3rd Generation EPYC System
- 2 sockets each with 64C/128T
- NPS1 (Each socket is a NUMA node)
- C2 Disabled (POLL and C1(MWAIT) remained enabled)

o Kernel Details

- tip:	tip:sched/core at commit 3657680f38cd ("sched/psi: Delete the
	'update_total' function parameter from update_triggers()") +
	cherry-pick commit 8dafa9d0eb1a sched/eevdf: Fix min_deadline heap
	integrity") from tip:sched/urgent + cherry-pick commit b01db23d5923
	("sched/eevdf: Fix pick_eevdf()") from tip:sched/urgent

update_curr_opt: tip + this patch

o Benchmark Results

==================================================================
Test          : hackbench
Units         : Normalized time in seconds
Interpretation: Lower is better
Statistic     : AMean
==================================================================
Case:           tip[pct imp](CV)    update_curr_opt[pct imp](CV)
 1-groups     1.00 [ -0.00]( 2.69)     1.01 [ -0.71]( 2.88)
 2-groups     1.00 [ -0.00]( 1.69)     1.01 [ -0.62]( 1.40)
 4-groups     1.00 [ -0.00]( 1.25)     1.01 [ -1.17]( 1.03)
 8-groups     1.00 [ -0.00]( 1.36)     1.00 [ -0.43]( 0.83)
16-groups     1.00 [ -0.00]( 1.44)     1.00 [ -0.13]( 2.32)


==================================================================
Test          : tbench
Units         : Normalized throughput
Interpretation: Higher is better
Statistic     : AMean
==================================================================
Clients:    tip[pct imp](CV)    update_curr_opt[pct imp](CV)
    1     1.00 [  0.00]( 0.33)     1.01 [  0.52]( 1.51)
    2     1.00 [  0.00]( 0.22)     1.00 [ -0.01]( 0.37)
    4     1.00 [  0.00]( 0.25)     0.99 [ -0.71]( 0.60)
    8     1.00 [  0.00]( 0.71)     1.00 [ -0.26]( 0.36)
   16     1.00 [  0.00]( 0.79)     0.99 [ -1.21]( 0.77)
   32     1.00 [  0.00]( 0.94)     0.99 [ -0.82]( 1.46)
   64     1.00 [  0.00]( 1.76)     0.99 [ -0.92]( 1.25)
  128     1.00 [  0.00]( 0.68)     0.98 [ -2.22]( 1.19)
  256     1.00 [  0.00]( 1.23)     0.99 [ -1.43]( 0.79)
  512     1.00 [  0.00]( 0.28)     0.99 [ -0.93]( 0.14)
 1024     1.00 [  0.00]( 0.20)     0.99 [ -1.44]( 0.41)


==================================================================
Test          : stream-10
Units         : Normalized Bandwidth, MB/s
Interpretation: Higher is better
Statistic     : HMean
==================================================================
Test:       tip[pct imp](CV)    update_curr_opt[pct imp](CV)
 Copy     1.00 [  0.00](11.88)     1.22 [ 21.92]( 7.37)
Scale     1.00 [  0.00]( 7.01)     1.04 [  4.02]( 4.89)
  Add     1.00 [  0.00]( 6.56)     1.11 [ 11.03]( 4.77)
Triad     1.00 [  0.00]( 8.81)     1.14 [ 14.12]( 3.89)


==================================================================
Test          : stream-100
Units         : Normalized Bandwidth, MB/s
Interpretation: Higher is better
Statistic     : HMean
==================================================================
Test:       tip[pct imp](CV)    update_curr_opt[pct imp](CV)
 Copy     1.00 [  0.00]( 1.07)     1.01 [  0.77]( 1.59)
Scale     1.00 [  0.00]( 4.81)     0.97 [ -2.99]( 7.18)
  Add     1.00 [  0.00]( 4.56)     0.98 [ -2.39]( 6.86)
Triad     1.00 [  0.00]( 1.78)     1.00 [ -0.35]( 4.22)


==================================================================
Test          : netperf
Units         : Normalized Througput
Interpretation: Higher is better
Statistic     : AMean
==================================================================
Clients:         tip[pct imp](CV)    update_curr_opt[pct imp](CV)
 1-clients     1.00 [  0.00]( 0.62)     0.99 [ -1.03]( 0.24)
 2-clients     1.00 [  0.00]( 0.36)     1.00 [ -0.32]( 0.66)
 4-clients     1.00 [  0.00]( 0.31)     1.00 [ -0.17]( 0.44)
 8-clients     1.00 [  0.00]( 0.39)     1.00 [  0.24]( 0.67)
16-clients     1.00 [  0.00]( 0.58)     1.00 [  0.50]( 0.46)
32-clients     1.00 [  0.00]( 0.71)     1.01 [  0.54]( 0.66)
64-clients     1.00 [  0.00]( 2.13)     1.00 [  0.35]( 1.80)
128-clients    1.00 [  0.00]( 0.94)     0.99 [ -0.71]( 0.97)
256-clients    1.00 [  0.00]( 6.09)     1.01 [  1.28]( 3.41)
512-clients    1.00 [  0.00](55.28)     1.01 [  1.32](49.78)


==================================================================
Test          : schbench
Units         : Normalized 99th percentile latency in us
Interpretation: Lower is better
Statistic     : Median
==================================================================
#workers: tip[pct imp](CV)    update_curr_opt[pct imp](CV)
  1     1.00 [ -0.00]( 2.91)     0.97 [  2.56]( 1.53)
  2     1.00 [ -0.00](21.78)     0.89 [ 10.53](10.23)
  4     1.00 [ -0.00]( 4.88)     1.07 [ -7.32]( 6.82)
  8     1.00 [ -0.00]( 2.49)     1.00 [ -0.00]( 9.53)
 16     1.00 [ -0.00]( 3.70)     1.02 [ -1.75]( 0.99)
 32     1.00 [ -0.00](12.65)     0.83 [ 16.51]( 4.41)
 64     1.00 [ -0.00]( 3.98)     0.97 [  2.59]( 8.27)
128     1.00 [ -0.00]( 1.49)     0.96 [  3.60]( 8.01)
256     1.00 [ -0.00](40.79)     0.80 [ 20.39](36.89)
512     1.00 [ -0.00]( 1.12)     0.98 [  2.20]( 0.75)


==================================================================
Test          : ycsb-cassandra
Units         : Normalized throughput
Interpretation: Higher is better
Statistic     : Mean
==================================================================
metric      tip     update_curr_opt (%diff)
throughput  1.00    1.00 (%diff: -0.45%)


==================================================================
Test          : ycsb-mondodb
Units         : Normalized throughput
Interpretation: Higher is better
Statistic     : Mean
==================================================================
metric      tip     update_curr_opt (%diff)
throughput  1.00    1.00 (%diff: -0.13%)


==================================================================
Test          : DeathStarBench
Units         : Normalized throughput
Interpretation: Higher is better
Statistic     : Mean
==================================================================
Pinning   scaling   tip     update_curr_opt (%diff)
1CCD        1       1.00    1.01 (%diff: 0.57%)
2CCD        2       1.00    1.00 (%diff: -0.27%)
4CCD        4       1.00    1.00 (%diff: 0.06%)
8CCD        8       1.00    1.00 (%diff: 0.45%)

--
Feel free to include

Tested-by: K Prateek Nayak <kprateek.nayak@....com>

I'll keep a lookout for future versions.

> ---
> 
> Not sure what the XXX above place_entity() is for, maybe it can go away?
> 
> Based on tip/sched/core.
> 
>  kernel/sched/fair.c | 14 ++------------
>  1 file changed, 2 insertions(+), 12 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 04fbcbda97d5f..db2ca9bf9cc49 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5047,15 +5047,6 @@ static inline bool cfs_bandwidth_used(void);
>  static void
>  enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
>  {
> -	bool curr = cfs_rq->curr == se;
> -
> -	/*
> -	 * If we're the current task, we must renormalise before calling
> -	 * update_curr().
> -	 */
> -	if (curr)
> -		place_entity(cfs_rq, se, flags);
> -
>  	update_curr(cfs_rq);
>  
>  	/*
> @@ -5080,8 +5071,7 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
>  	 * XXX now that the entity has been re-weighted, and it's lag adjusted,
>  	 * we can place the entity.
>  	 */
> -	if (!curr)
> -		place_entity(cfs_rq, se, flags);
> +	place_entity(cfs_rq, se, flags);
>  
>  	account_entity_enqueue(cfs_rq, se);
>  
> @@ -5091,7 +5081,7 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
>  
>  	check_schedstat_required();
>  	update_stats_enqueue_fair(cfs_rq, se, flags);
> -	if (!curr)
> +	if (cfs_rq->curr != se)
>  		__enqueue_entity(cfs_rq, se);
>  	se->on_rq = 1;
>  

--
Thanks and Regards,
Prateek