lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6b976368-9ed9-e2a9-a725-2f8f7e76d401@amd.com>
Date:   Thu, 5 Oct 2023 11:26:07 +0530
From:   K Prateek Nayak <kprateek.nayak@....com>
To:     Daniel Jordan <daniel.m.jordan@...cle.com>
Cc:     bristot@...hat.com, bsegall@...gle.com, chris.hyser@...cle.com,
        corbet@....net, dietmar.eggemann@....com, efault@....de,
        joel@...lfernandes.org, joshdon@...gle.com, juri.lelli@...hat.com,
        linux-kernel@...r.kernel.org, mgorman@...e.de, mingo@...nel.org,
        patrick.bellasi@...bug.net, pavel@....cz, pjt@...gle.com,
        qperret@...gle.com, qyousef@...alina.io, rostedt@...dmis.org,
        tglx@...utronix.de, tim.c.chen@...ux.intel.com, timj@....org,
        vincent.guittot@...aro.org, youssefesmat@...omium.org,
        yu.c.chen@...el.com, peterz@...radead.org
Subject: Re: [PATCH] sched/fair: Preserve PLACE_DEADLINE_INITIAL deadline

Hello Daniel,

On 10/4/2023 6:47 AM, Daniel Jordan wrote:
> An entity is supposed to get an earlier deadline with
> PLACE_DEADLINE_INITIAL when it's forked, but the deadline gets
> overwritten soon after in enqueue_entity() the first time a forked
> entity is woken so that PLACE_DEADLINE_INITIAL is effectively a no-op.
> 
> Placing in task_fork_fair() seems unnecessary since none of the values
> that get set (slice, vruntime, deadline) are used before they're set
> again at enqueue time, so get rid of that and just pass ENQUEUE_INITIAL
> to enqueue_entity() via wake_up_new_task().
> 
> Fixes: e8f331bcc270 ("sched/smp: Use lag to simplify cross-runqueue placement")
> Signed-off-by: Daniel Jordan <daniel.m.jordan@...cle.com>

I got a chance to this this on a 3rd Generation EPYC system. I don't
see anything out of the ordinary except for a small regression on
hackbench. I'll leave the full result below.

o System details

- 3rd Generation EPYC System
- 2 sockets each with 64C/128T
- NPS1 (Each socket is a NUMA node)
- Boost enabled, C2 Disabled (POLL and MWAIT based C1 remained enabled)


o Kernel Details

- tip:	tip:sched/core at commit d4d6596b4386 ("sched/headers: Remove
	duplicate header inclusions")

- place-deadline-fix: tip + this patch


o Benchmark Results

==================================================================
Test          : hackbench
Units         : Normalized time in seconds
Interpretation: Lower is better
Statistic     : AMean
==================================================================
Case:           tip[pct imp](CV)    place-deadline-fix[pct imp](CV)
 1-groups     1.00 [ -0.00]( 2.58)     1.04 [ -3.63]( 3.14)
 2-groups     1.00 [ -0.00]( 1.87)     1.03 [ -2.98]( 1.85)
 4-groups     1.00 [ -0.00]( 1.63)     1.02 [ -2.35]( 1.59)
 8-groups     1.00 [ -0.00]( 1.38)     1.03 [ -2.92]( 1.20)
16-groups     1.00 [ -0.00]( 2.67)     1.02 [ -1.61]( 2.08)


==================================================================
Test          : tbench
Units         : Normalized throughput
Interpretation: Higher is better
Statistic     : AMean
==================================================================
Clients:           tip[pct imp](CV)    place-deadline-fix[pct imp](CV)
    1     1.00 [  0.00]( 0.59)     1.02 [  2.09]( 0.07)
    2     1.00 [  0.00]( 1.19)     1.02 [  2.38]( 0.82)
    4     1.00 [  0.00]( 0.33)     1.03 [  2.89]( 0.99)
    8     1.00 [  0.00]( 0.76)     1.02 [  2.10]( 0.46)
   16     1.00 [  0.00]( 1.10)     1.01 [  0.81]( 0.49)
   32     1.00 [  0.00]( 1.47)     1.02 [  1.77]( 0.58)
   64     1.00 [  0.00]( 1.77)     1.02 [  1.83]( 1.77)
  128     1.00 [  0.00]( 0.41)     1.02 [  2.49]( 0.52)
  256     1.00 [  0.00]( 0.63)     1.03 [  3.03]( 1.38)
  512     1.00 [  0.00]( 0.32)     1.02 [  1.61]( 0.45)
 1024     1.00 [  0.00]( 0.22)     1.01 [  1.00]( 0.26)


==================================================================
Test          : stream-10
Units         : Normalized Bandwidth, MB/s
Interpretation: Higher is better
Statistic     : HMean
==================================================================
Test:           tip[pct imp](CV)    place-deadline-fix[pct imp](CV)
 Copy     1.00 [  0.00]( 9.30)     0.85 [-15.36](11.26)
Scale     1.00 [  0.00]( 6.67)     0.98 [ -2.36]( 7.53)
  Add     1.00 [  0.00]( 6.77)     0.92 [ -7.86]( 7.83)
Triad     1.00 [  0.00]( 7.36)     0.94 [ -5.57]( 6.82)


==================================================================
Test          : stream-100
Units         : Normalized Bandwidth, MB/s
Interpretation: Higher is better
Statistic     : HMean
==================================================================
Test:           tip[pct imp](CV)    place-deadline-fix[pct imp](CV)
 Copy     1.00 [  0.00]( 1.83)     0.96 [ -3.68]( 5.08)
Scale     1.00 [  0.00]( 6.41)     1.03 [  2.66]( 5.28)
  Add     1.00 [  0.00]( 6.23)     1.02 [  1.54]( 4.97)
Triad     1.00 [  0.00]( 0.89)     0.94 [ -5.68]( 6.78)


==================================================================
Test          : netperf
Units         : Normalized Througput
Interpretation: Higher is better
Statistic     : AMean
==================================================================
Clients:           tip[pct imp](CV)    place-deadline-fix[pct imp](CV)
 1-clients     1.00 [  0.00]( 0.05)     1.02 [  1.83]( 1.98)
 2-clients     1.00 [  0.00]( 0.93)     1.02 [  1.87]( 2.45)
 4-clients     1.00 [  0.00]( 0.54)     1.02 [  2.19]( 1.99)
 8-clients     1.00 [  0.00]( 0.48)     1.02 [  2.29]( 2.27)
16-clients     1.00 [  0.00]( 0.42)     1.02 [  1.60]( 1.70)
32-clients     1.00 [  0.00]( 0.78)     1.02 [  1.88]( 2.08)
64-clients     1.00 [  0.00]( 1.45)     1.02 [  2.33]( 2.18)
128-clients    1.00 [  0.00]( 0.97)     1.02 [  2.38]( 1.95)
256-clients    1.00 [  0.00]( 4.57)     1.02 [  2.50]( 5.42)
512-clients    1.00 [  0.00](52.74)     1.03 [  3.38](49.69)


==================================================================
Test          : schbench
Units         : Normalized 99th percentile latency in us
Interpretation: Lower is better
Statistic     : Median
==================================================================
#workers:           tip[pct imp](CV)    place-deadline-fix[pct imp](CV)
  1     1.00 [ -0.00]( 3.95)     0.90 [ 10.26](31.80)
  2     1.00 [ -0.00](10.45)     1.08 [ -7.89](15.33)
  4     1.00 [ -0.00]( 4.76)     0.93 [  7.14]( 3.95)
  8     1.00 [ -0.00]( 9.35)     1.06 [ -6.25]( 8.90)
 16     1.00 [ -0.00]( 8.84)     0.92 [  8.06]( 4.39)
 32     1.00 [ -0.00]( 3.33)     1.04 [ -4.40]( 3.68)
 64     1.00 [ -0.00]( 6.70)     0.96 [  4.17]( 2.75)
128     1.00 [ -0.00]( 0.71)     0.96 [  3.55]( 1.26)
256     1.00 [ -0.00](31.20)     1.28 [-28.21]( 9.69)
512     1.00 [ -0.00]( 4.98)     1.00 [  0.48]( 2.76)


==================================================================
Test          : ycsb-cassandra
Units         : Normalized throughput
Interpretation: Higher is better
Statistic     : Mean
==================================================================
Metric          tip    place-deadline-fix(pct imp)
Throughput      1.00    1.01 (%diff: 1.06%)


==================================================================
Test          : ycsb-mondodb
Units         : Normalized throughput
Interpretation: Higher is better
Statistic     : Mean
==================================================================
Metric          tip    place-deadline-fix(pct imp)
Throughput      1.00    1.00 (%diff: 0.25%)


==================================================================
Test          : DeathStarBench
Units         : Normalized throughput
Interpretation: Higher is better
Statistic     : Mean
==================================================================
Pinning      scaling     tip            place-deadline-fix(pct imp)
 1CCD           1       1.00            1.00 (%diff: -0.32%)
 2CCD           2       1.00            1.00 (%diff: -0.26%)
 4CCD           4       1.00            1.00 (%diff: 0.17%)
 8CCD           8       1.00            1.00 (%diff: -0.17%)


--
I see there is a v2. I'll give that a spin as well.

> ---
> 
> Tested on top of peterz/sched/eevdf from 2023-10-03.
> 
>  kernel/sched/core.c | 2 +-
>  kernel/sched/fair.c | 1 -
>  2 files changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 779cdc7969c81..500e2dbfd41dd 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4854,7 +4854,7 @@ void wake_up_new_task(struct task_struct *p)
>  	update_rq_clock(rq);
>  	post_init_entity_util_avg(p);
>  
> -	activate_task(rq, p, ENQUEUE_NOCLOCK);
> +	activate_task(rq, p, ENQUEUE_INITIAL | ENQUEUE_NOCLOCK);
>  	trace_sched_wakeup_new(p);
>  	wakeup_preempt(rq, p, WF_FORK);
>  #ifdef CONFIG_SMP
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index a0b4dac2662c9..5872b8a3f5891 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -12446,7 +12446,6 @@ static void task_fork_fair(struct task_struct *p)
>  	curr = cfs_rq->curr;
>  	if (curr)
>  		update_curr(cfs_rq);
> -	place_entity(cfs_rq, se, ENQUEUE_INITIAL);
>  	rq_unlock(rq, &rf);
>  }
>  
 
--
Thanks and Regards,
Prateek

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ