linux-kernel - Re: [PATCH v2] sched/fair: Preserve PLACE_DEADLINE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5b5c513c-92a8-d9eb-bc07-273518ed3280@amd.com>
Date:   Thu, 12 Oct 2023 10:18:10 +0530
From:   K Prateek Nayak <kprateek.nayak@....com>
To:     Daniel Jordan <daniel.m.jordan@...cle.com>
Cc:     peterz@...radead.org, bristot@...hat.com, bsegall@...gle.com,
        chris.hyser@...cle.com, corbet@....net, dietmar.eggemann@....com,
        efault@....de, joel@...lfernandes.org, joshdon@...gle.com,
        juri.lelli@...hat.com, linux-kernel@...r.kernel.org,
        mgorman@...e.de, mingo@...nel.org, patrick.bellasi@...bug.net,
        pavel@....cz, pjt@...gle.com, qperret@...gle.com,
        qyousef@...alina.io, rostedt@...dmis.org, tglx@...utronix.de,
        tim.c.chen@...ux.intel.com, timj@....org,
        vincent.guittot@...aro.org, youssefesmat@...omium.org,
        yu.c.chen@...el.com
Subject: Re: [PATCH v2] sched/fair: Preserve PLACE_DEADLINE_INITIAL deadline

Hello Daniel,

Same as v1, I do not see any regressions with this version either.
I'll leave the full results below.

o Machine details

- 3rd Generation EPYC System
- 2 sockets each with 64C/128T
- NPS1 (Each socket is a NUMA node)
- C2 Disabled (POLL and C1(MWAIT) remained enabled)

o Kernel Details

- tip:	tip:sched/core at commit 238437d88cea ("intel_idle: Add ibrs_off
	module parameter to force-disable IBRS")
	[For DeathStarBench comparisons alone since I ran to the issue
	which below commit solves]
	+ min_deadline fix commit 8dafa9d0eb1a ("sched/eevdf: Fix
	min_deadline heap integrity") from tip:sched/urgent 

- place-initial-fix: tip + this patch as is

o Benchmark Results

==================================================================
Test          : hackbench
Units         : Normalized time in seconds
Interpretation: Lower is better
Statistic     : AMean
==================================================================
Case:           tip[pct imp](CV)    place-initial-fix[pct imp](CV)
 1-groups     1.00 [ -0.00]( 2.11)     1.01 [ -1.08]( 2.60)
 2-groups     1.00 [ -0.00]( 1.31)     1.01 [ -0.93]( 1.61)
 4-groups     1.00 [ -0.00]( 1.04)     1.00 [ -0.00]( 1.25)
 8-groups     1.00 [ -0.00]( 1.34)     0.99 [  1.15]( 0.85)
16-groups     1.00 [ -0.00]( 2.45)     1.00 [ -0.27]( 2.32)


==================================================================
Test          : tbench
Units         : Normalized throughput
Interpretation: Higher is better
Statistic     : AMean
==================================================================
Clients:    tip[pct imp](CV)    place-initial-fix[pct imp](CV)
    1     1.00 [  0.00]( 0.46)     0.99 [ -0.59]( 0.88)
    2     1.00 [  0.00]( 0.64)     0.99 [ -1.43]( 0.69)
    4     1.00 [  0.00]( 0.59)     0.99 [ -1.49]( 0.76)
    8     1.00 [  0.00]( 0.34)     1.00 [ -0.35]( 0.20)
   16     1.00 [  0.00]( 0.72)     0.98 [ -1.96]( 1.97)
   32     1.00 [  0.00]( 0.65)     1.00 [ -0.24]( 1.07)
   64     1.00 [  0.00]( 0.59)     1.00 [ -0.14]( 1.18)
  128     1.00 [  0.00]( 1.19)     0.99 [ -1.04]( 0.93)
  256     1.00 [  0.00]( 0.16)     1.00 [ -0.18]( 0.34)
  512     1.00 [  0.00]( 0.20)     0.99 [ -0.62]( 0.02)
 1024     1.00 [  0.00]( 0.06)     1.00 [ -0.49]( 0.37)


==================================================================
Test          : stream-10
Units         : Normalized Bandwidth, MB/s
Interpretation: Higher is better
Statistic     : HMean
==================================================================
Test:       tip[pct imp](CV)    place-initial-fix[pct imp](CV)
 Copy     1.00 [  0.00]( 6.04)     1.00 [ -0.21]( 7.98)
Scale     1.00 [  0.00]( 5.44)     0.99 [ -0.75]( 5.75)
  Add     1.00 [  0.00]( 5.44)     0.99 [ -1.48]( 5.40)
Triad     1.00 [  0.00]( 7.82)     1.02 [  2.21]( 8.33)


==================================================================
Test          : stream-100
Units         : Normalized Bandwidth, MB/s
Interpretation: Higher is better
Statistic     : HMean
==================================================================
Test:       tip[pct imp](CV)    place-initial-fix[pct imp](CV)
 Copy     1.00 [  0.00]( 1.14)     1.00 [  0.40]( 1.12)
Scale     1.00 [  0.00]( 4.60)     1.01 [  1.05]( 4.99)
  Add     1.00 [  0.00]( 4.91)     1.00 [ -0.14]( 4.97)
Triad     1.00 [  0.00]( 0.60)     0.96 [ -3.53]( 6.13)


==================================================================
Test          : netperf
Units         : Normalized Througput
Interpretation: Higher is better
Statistic     : AMean
==================================================================
Clients:         tip[pct imp](CV)    place-initial-fix[pct imp](CV)
 1-clients     1.00 [  0.00]( 0.61)     1.00 [  0.40]( 0.75)
 2-clients     1.00 [  0.00]( 0.44)     1.00 [ -0.47]( 0.91)
 4-clients     1.00 [  0.00]( 0.75)     1.00 [ -0.23]( 0.84)
 8-clients     1.00 [  0.00]( 0.65)     1.00 [ -0.07]( 0.62)
16-clients     1.00 [  0.00]( 0.49)     1.00 [ -0.29]( 0.56)
32-clients     1.00 [  0.00]( 0.57)     1.00 [ -0.14]( 0.46)
64-clients     1.00 [  0.00]( 1.67)     1.00 [ -0.14]( 1.81)
128-clients    1.00 [  0.00]( 1.11)     1.01 [  0.64]( 1.04)
256-clients    1.00 [  0.00]( 2.64)     0.99 [ -1.29]( 5.25)
512-clients    1.00 [  0.00](52.49)     0.99 [ -0.57](53.01)


==================================================================
Test          : schbench
Units         : Normalized 99th percentile latency in us
Interpretation: Lower is better
Statistic     : Median
==================================================================
#workers: tip[pct imp](CV)    place-initial-fix[pct imp](CV)
  1     1.00 [ -0.00]( 8.41)     1.05 [ -5.41](13.45)
  2     1.00 [ -0.00]( 5.29)     0.88 [ 12.50](13.21)
  4     1.00 [ -0.00]( 1.32)     1.00 [ -0.00]( 4.80)
  8     1.00 [ -0.00]( 9.52)     0.94 [  6.25]( 8.85)
 16     1.00 [ -0.00]( 1.61)     0.97 [  3.23]( 5.00)
 32     1.00 [ -0.00]( 7.27)     0.88 [ 12.50]( 2.30)
 64     1.00 [ -0.00]( 6.96)     1.07 [ -6.94]( 4.94)
128     1.00 [ -0.00]( 3.41)     0.99 [  1.44]( 2.69)
256     1.00 [ -0.00](32.95)     0.81 [ 19.17](16.38)
512     1.00 [ -0.00]( 3.20)     0.98 [  1.66]( 2.35)


==================================================================
Test          : ycsb-cassandra
Units         : Normalized throughput
Interpretation: Higher is better
Statistic     : Mean
==================================================================
metric          tip    place-initial-fix(%diff)
throughput      1.00    0.99 (%diff: -0.67%)


==================================================================
Test          : ycsb-mondodb
Units         : Normalized throughput
Interpretation: Higher is better
Statistic     : Mean
==================================================================
metric          tip    place-initial-fix(%diff)
throughput      1.00    0.99 (%diff: -0.68%)


==================================================================
Test          : DeathStarBench
Units         : Normalized throughput
Interpretation: Higher is better
Statistic     : Mean
Note	      : Comparisons contains additional commit 8dafa9d0eb1a
		("sched/eevdf: Fix min_deadline heap integrity") from
		tip:sched/urgent to fix an EEVDF issue being hit
==================================================================
Pinning      scaling    tip     place-initial-fix (%diff)
1CCD            1       1.00    1.00 (%diff: -0.09%)
2CCD            2       1.00    1.02 (%diff: 2.46%)
4CCD            4       1.00    1.00 (%diff: 0.45%)
8CCD            8       1.00    1.00 (%diff: -0.46%)

--

On 10/4/2023 6:39 PM, Daniel Jordan wrote:
> An entity is supposed to get an earlier deadline with
> PLACE_DEADLINE_INITIAL when it's forked, but the deadline gets
> overwritten soon after in enqueue_entity() the first time a forked
> entity is woken so that PLACE_DEADLINE_INITIAL is effectively a no-op.
> 
> Placing in task_fork_fair() seems unnecessary since none of the values
> that get set (slice, vruntime, deadline) are used before they're set
> again at enqueue time, so get rid of that (and with it all of
> task_fork_fair()) and just pass ENQUEUE_INITIAL to enqueue_entity() via
> wake_up_new_task().
> 
> Fixes: e8f331bcc270 ("sched/smp: Use lag to simplify cross-runqueue placement")
> Signed-off-by: Daniel Jordan <daniel.m.jordan@...cle.com>

Tested-by: K Prateek Nayak <kprateek.nayak@....com>

> ---
> 
> v2
>  - place_entity() seems like the only reason for task_fork_fair() to exist
>    after the recent removal of sysctl_sched_child_runs_first, so take out
>    the whole function.
> 
> Still based on today's peterz/sched/eevdf
> 
>  kernel/sched/core.c |  2 +-
>  kernel/sched/fair.c | 24 ------------------------
>  2 files changed, 1 insertion(+), 25 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 779cdc7969c81..500e2dbfd41dd 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4854,7 +4854,7 @@ void wake_up_new_task(struct task_struct *p)
>  	update_rq_clock(rq);
>  	post_init_entity_util_avg(p);
>  
> -	activate_task(rq, p, ENQUEUE_NOCLOCK);
> +	activate_task(rq, p, ENQUEUE_INITIAL | ENQUEUE_NOCLOCK);
>  	trace_sched_wakeup_new(p);
>  	wakeup_preempt(rq, p, WF_FORK);
>  #ifdef CONFIG_SMP
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index a0b4dac2662c9..3827b302eeb9b 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -12427,29 +12427,6 @@ static void task_tick_fair(struct rq *rq, struct task_struct *curr, int queued)
>  	task_tick_core(rq, curr);
>  }
>  
> -/*
> - * called on fork with the child task as argument from the parent's context
> - *  - child not yet on the tasklist
> - *  - preemption disabled
> - */
> -static void task_fork_fair(struct task_struct *p)
> -{
> -	struct sched_entity *se = &p->se, *curr;
> -	struct cfs_rq *cfs_rq;
> -	struct rq *rq = this_rq();
> -	struct rq_flags rf;
> -
> -	rq_lock(rq, &rf);
> -	update_rq_clock(rq);
> -
> -	cfs_rq = task_cfs_rq(current);
> -	curr = cfs_rq->curr;
> -	if (curr)
> -		update_curr(cfs_rq);
> -	place_entity(cfs_rq, se, ENQUEUE_INITIAL);
> -	rq_unlock(rq, &rf);
> -}
> -
>  /*
>   * Priority of the task has changed. Check to see if we preempt
>   * the current task.
> @@ -12953,7 +12930,6 @@ DEFINE_SCHED_CLASS(fair) = {
>  #endif
>  
>  	.task_tick		= task_tick_fair,
> -	.task_fork		= task_fork_fair,
>  
>  	.prio_changed		= prio_changed_fair,
>  	.switched_from		= switched_from_fair,

--
Thanks and Regards,
Prateek