lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <7592f555-21f8-284a-dbc7-0a6ab4d42c0d@amd.com>
Date: Wed, 17 Apr 2024 14:18:46 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: David Vernet <void@...ifault.com>
Cc: mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
 bsegall@...gle.com, mgorman@...e.de, bristot@...hat.com,
 vschneid@...hat.com, youssefesmat@...gle.com, joelaf@...gle.com,
 roman.gushchin@...ux.dev, yu.c.chen@...el.com, gautham.shenoy@....com,
 aboorvad@...ux.vnet.ibm.com, wuyun.abel@...edance.com, tj@...nel.org,
 kernel-team@...a.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 0/8] sched: Implement shared runqueue in fair.c

Hello David,

On 12/12/2023 6:01 AM, David Vernet wrote:
> This is v4 of the shared runqueue patchset. This patch set is based off
> of commit 418146e39891 ("freezer,sched: Clean saved_state when restoring
> it during thaw") on the sched/core branch of tip.git.
> 
> In prior versions of this patch set, I was observing consistent and
> statistically significant wins for several benchmarks when this feature
> was enabled, such as kernel compile and hackbench. After rebasing onto
> the latest sched/core on tip.git, I'm no longer observing these wins,
> and in fact observe some performance loss with SHARED_RUNQ on hackbench.
> I ended up bisecting this to when EEVDF was merged.
> 
> As I mentioned in [0], our plan for now is to take a step back and
> re-evaluate how we want to proceed with this patch set. That said, I did
> want to send this out in the interim in case it could be of interest to
> anyone else who would like to continue to experiment with it.

I was doing a bunch of testing prior to OSPM in case folks wanted to
discuss the results. Leaving the results of SHARED_RUNQ runs from a
recent-ish tip below.

tl;dr

- I haven't dug deeper into the regressions but the most prominent one
  seems to be hackbench with lower number of groups but the picture
  flips with higher number of groups.

Other benchmarks behave more or less similar to the tip. I'll leave the
full results below: 

o System Details

- 3rd Generation EPYC System
- 2 x 64C/128T
- NPS1 mode

o Kernels

tip:			tip:sched/core at commit 8cec3dd9e593
			("sched/core: Simplify code by removing
			 duplicate #ifdefs")

shared_runq:		tip + this series

o Results

==================================================================
Test          : hackbench
Units         : Normalized time in seconds
Interpretation: Lower is better
Statistic     : AMean
==================================================================
Case:           tip[pct imp](CV)    shared_runq[pct imp](CV)
 1-groups     1.00 [ -0.00]( 1.80)     4.49 [-349.19](92.14)
 2-groups     1.00 [ -0.00]( 1.76)     1.02 [ -2.17](19.20)
 4-groups     1.00 [ -0.00]( 1.82)     0.86 [ 13.53]( 1.37)
 8-groups     1.00 [ -0.00]( 1.40)     0.91 [  8.73]( 2.39)
16-groups     1.00 [ -0.00]( 3.38)     0.91 [  9.47]( 2.39)


==================================================================
Test          : tbench
Units         : Normalized throughput
Interpretation: Higher is better
Statistic     : AMean
==================================================================
Clients:           tip[pct imp](CV)    shared_runq[pct imp](CV)
    1     1.00 [  0.00]( 0.44)     1.00 [ -0.39]( 0.53)
    2     1.00 [  0.00]( 0.39)     1.00 [ -0.16]( 0.57)
    4     1.00 [  0.00]( 0.40)     1.00 [ -0.07]( 0.69)
    8     1.00 [  0.00]( 0.16)     0.99 [ -0.67]( 0.45)
   16     1.00 [  0.00]( 3.00)     1.03 [  2.86]( 1.23)
   32     1.00 [  0.00]( 0.84)     1.00 [ -0.32]( 1.46)
   64     1.00 [  0.00]( 1.66)     0.98 [ -1.60]( 0.79)
  128     1.00 [  0.00]( 1.04)     1.01 [  0.57]( 0.59)
  256     1.00 [  0.00]( 0.26)     0.98 [ -1.91]( 2.48)
  512     1.00 [  0.00]( 0.15)     1.00 [  0.22]( 0.16)
 1024     1.00 [  0.00]( 0.20)     1.00 [ -0.37]( 0.02)


==================================================================
Test          : stream-10
Units         : Normalized Bandwidth, MB/s
Interpretation: Higher is better
Statistic     : HMean
==================================================================
Test:           tip[pct imp](CV)    shared_runq[pct imp](CV)
 Copy     1.00 [  0.00]( 6.19)     1.10 [  9.51]( 4.30)
Scale     1.00 [  0.00]( 6.47)     1.03 [  2.90]( 2.82)
  Add     1.00 [  0.00]( 6.50)     1.04 [  3.82]( 3.10)
Triad     1.00 [  0.00]( 5.70)     1.01 [  1.49]( 4.30)


==================================================================
Test          : stream-100
Units         : Normalized Bandwidth, MB/s
Interpretation: Higher is better
Statistic     : HMean
==================================================================
Test:           tip[pct imp](CV)    shared_runq[pct imp](CV)
 Copy     1.00 [  0.00]( 3.22)     1.04 [  3.67]( 2.41)
Scale     1.00 [  0.00]( 6.17)     1.03 [  2.75]( 1.63)
  Add     1.00 [  0.00]( 5.12)     1.02 [  2.42]( 2.10)
Triad     1.00 [  0.00]( 2.29)     1.01 [  1.11]( 1.59)


==================================================================
Test          : netperf
Units         : Normalized Througput
Interpretation: Higher is better
Statistic     : AMean
==================================================================
Clients:           tip[pct imp](CV)    shared_runq[pct imp](CV)
 1-clients     1.00 [  0.00]( 0.17)     0.99 [ -0.65]( 0.40)
 2-clients     1.00 [  0.00]( 0.49)     1.00 [ -0.17]( 0.27)
 4-clients     1.00 [  0.00]( 0.65)     1.00 [  0.09]( 0.69)
 8-clients     1.00 [  0.00]( 0.56)     1.00 [ -0.05]( 0.61)
16-clients     1.00 [  0.00]( 0.78)     1.00 [ -0.23]( 0.58)
32-clients     1.00 [  0.00]( 0.62)     0.98 [ -2.22]( 0.76)
64-clients     1.00 [  0.00]( 1.41)     0.96 [ -3.75]( 1.19)
128-clients    1.00 [  0.00]( 0.83)     0.98 [ -2.29]( 0.97)
256-clients    1.00 [  0.00]( 4.60)     0.96 [ -4.18]( 3.02)
512-clients    1.00 [  0.00](54.18)     0.99 [ -1.36](52.79)


==================================================================
Test          : schbench
Units         : Normalized 99th percentile latency in us
Interpretation: Lower is better
Statistic     : Median
==================================================================
#workers:           tip[pct imp](CV)    shared_runq[pct imp](CV)
  1     1.00 [ -0.00](34.63)     1.40 [-40.00]( 2.38)
  2     1.00 [ -0.00]( 2.70)     1.08 [ -8.11]( 7.53)
  4     1.00 [ -0.00]( 4.70)     0.93 [  6.67]( 7.16)
  8     1.00 [ -0.00]( 5.09)     0.92 [  7.55](10.20)
 16     1.00 [ -0.00]( 5.08)     0.97 [  3.39]( 2.00)
 32     1.00 [ -0.00]( 2.91)     1.03 [ -3.33]( 2.22)
 64     1.00 [ -0.00]( 2.73)     0.99 [  1.04]( 3.43)
128     1.00 [ -0.00]( 7.89)     0.99 [  0.69]( 9.65)
256     1.00 [ -0.00](28.55)     0.92 [  7.94](19.85)
512     1.00 [ -0.00]( 2.11)     1.13 [-12.69]( 6.41)


==================================================================
Test          : DeathStarBench
Units         : Normalized throughput
Interpretation: Higher is better
Statistic     : Mean
==================================================================
Pinning      scaling     tip            shared_runq (pct imp)
 1CCD           1       1.00            1.01 (%diff: 1.45%)
 2CCD           2       1.00            1.01 (%diff: 1.71%)
 4CCD           4       1.00            1.01 (%diff: 1.66%)
 8CCD           8       1.00            1.00 (%diff: 0.63%)

--

> 
> [0]: https://lore.kernel.org/all/20231204193001.GA53255@maniforge/
> 
> v1 (RFC): https://lore.kernel.org/lkml/20230613052004.2836135-1-void@manifault.com/
> v2: https://lore.kernel.org/lkml/20230710200342.358255-1-void@manifault.com/
> v3: https://lore.kernel.org/all/20230809221218.163894-1-void@manifault.com/
> 
> [..snip..]
> 

I'll take a deeper look at the regressions soon. I'll update the thread
if I find anything interesting in the meantime.

--
Thanks and Regards,
Prateek

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ