linux-kernel - Re: [RESEND][PATCH v8 0/7] Preparatory changes for Proxy Execution v8

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c26251d2-e1bf-e5c7-0636-12ad886e1ea8@amd.com>
Date: Wed, 28 Feb 2024 23:07:37 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: John Stultz <jstultz@...gle.com>
Cc: LKML <linux-kernel@...r.kernel.org>, Joel Fernandes <joelaf@...gle.com>,
 Qais Yousef <qyousef@...gle.com>, Ingo Molnar <mingo@...hat.com>,
 Peter Zijlstra <peterz@...radead.org>, Juri Lelli <juri.lelli@...hat.com>,
 Vincent Guittot <vincent.guittot@...aro.org>,
 Dietmar Eggemann <dietmar.eggemann@....com>,
 Valentin Schneider <vschneid@...hat.com>,
 Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
 Zimuzo Ezeozue <zezeozue@...gle.com>, Youssef Esmat
 <youssefesmat@...gle.com>, Mel Gorman <mgorman@...e.de>,
 Daniel Bristot de Oliveira <bristot@...hat.com>,
 Will Deacon <will@...nel.org>, Waiman Long <longman@...hat.com>,
 Boqun Feng <boqun.feng@...il.com>, "Paul E. McKenney" <paulmck@...nel.org>,
 Metin Kaya <Metin.Kaya@....com>, Xuewen Yan <xuewen.yan94@...il.com>,
 Thomas Gleixner <tglx@...utronix.de>, kernel-team@...roid.com
Subject: Re: [RESEND][PATCH v8 0/7] Preparatory changes for Proxy Execution v8

Hello John,

On 2/28/2024 10:54 AM, John Stultz wrote:
> On Tue, Feb 27, 2024 at 9:12 PM K Prateek Nayak <kprateek.nayak@....com> wrote:
>> On 2/28/2024 10:21 AM, John Stultz wrote:
>>> Just to clarify: by "this series" did you test just the 7 preparatory
>>> patches submitted to the list here, or did you pull the full
>>> proxy-exec-v8-6.8-rc3 set from git?
>>
>> Just these preparatory patches for now. On my way to queue a run for the
>> whole set from your tree. I'll use the "proxy-exec-v8-6.8-rc3" branch and
>> pick the commits past the
>> "[ANNOTATION] === Proxy Exec patches past this point ===" till the commit
>> ff90fb583a81 ("FIX: Avoid using possibly uninitialized cpu value with
>> activate_blocked_entities()") on top of the tip:sched/core mentioned
>> above since it'll allow me to reuse the baseline numbers :)
>>
> 
> Ah, thank you for the clarification!
> 
> Also, I really appreciate your testing with the rest of the series as
> well. It will be good to have any potential problems identified early

I got a chance to test the whole of v8 patches on the same dual socket
3rd Generation EPYC system:

tl;dr

- There is a slight regression in hackbench but instead of the 10x
  blowup seen previously, it is only around 5% with overloaded case
  not regressing at all.

- A small but consistent (~2-3%) regression is seen in tbench and
  netperf.

- schbench is inconclusive due to run to run variance and stream is
  perf neutral with proxy execution.

I've not looked deeper into the regressions. I'll let you know if I
spot anything when digging deeper. Below are the full results:

o System Details

- 3rd Generation EPYC System
- 2 x 64C/128T
- NPS1 mode

o Kernels

tip:			tip:sched/core at commit 8cec3dd9e593
			("sched/core: Simplify code by removing
			 duplicate #ifdefs")

proxy-exec-full:	tip + proxy execution commits from
			"proxy-exec-v8-6.8-rc3" described previously in
			this thread.

o Results

==================================================================
Test          : hackbench
Units         : Normalized time in seconds
Interpretation: Lower is better
Statistic     : AMean
==================================================================
Case:           tip[pct imp](CV)    proxy-exec-full[pct imp](CV)
 1-groups     1.00 [ -0.00]( 2.08)     1.00 [ -0.18]( 3.90)
 2-groups     1.00 [ -0.00]( 0.89)     1.04 [ -4.43]( 0.78)
 4-groups     1.00 [ -0.00]( 0.81)     1.05 [ -4.82]( 1.03)
 8-groups     1.00 [ -0.00]( 0.78)     1.02 [ -1.90]( 1.00)
16-groups     1.00 [ -0.00]( 1.60)     1.01 [ -0.80]( 1.18)


==================================================================
Test          : tbench
Units         : Normalized throughput
Interpretation: Higher is better
Statistic     : AMean
==================================================================
Clients:    tip[pct imp](CV)    proxy-exec-full[pct imp](CV)
    1     1.00 [  0.00]( 0.71)     0.97 [ -3.00]( 0.15)
    2     1.00 [  0.00]( 0.25)     0.97 [ -3.35]( 0.98)
    4     1.00 [  0.00]( 0.85)     0.97 [ -3.26]( 1.40)
    8     1.00 [  0.00]( 1.00)     0.97 [ -2.75]( 0.46)
   16     1.00 [  0.00]( 1.25)     0.99 [ -1.27]( 0.11)
   32     1.00 [  0.00]( 0.35)     0.98 [ -2.42]( 0.06)
   64     1.00 [  0.00]( 0.71)     0.97 [ -2.76]( 1.81)
  128     1.00 [  0.00]( 0.46)     0.97 [ -2.67]( 0.88)
  256     1.00 [  0.00]( 0.24)     0.98 [ -1.97]( 0.98)
  512     1.00 [  0.00]( 0.30)     0.98 [ -2.41]( 0.38)
 1024     1.00 [  0.00]( 0.40)     0.98 [ -2.21]( 0.11)


==================================================================
Test          : stream-10
Units         : Normalized Bandwidth, MB/s
Interpretation: Higher is better
Statistic     : HMean
==================================================================
Test:       tip[pct imp](CV)    proxy-exec-full[pct imp](CV)
 Copy     1.00 [  0.00]( 9.73)     1.00 [  0.26]( 6.36)
Scale     1.00 [  0.00]( 5.57)     1.02 [  1.59]( 2.98)
  Add     1.00 [  0.00]( 5.43)     1.00 [  0.48]( 2.77)
Triad     1.00 [  0.00]( 5.50)     0.98 [ -2.18]( 6.06)


==================================================================
Test          : stream-100
Units         : Normalized Bandwidth, MB/s
Interpretation: Higher is better
Statistic     : HMean
==================================================================
Test:       tip[pct imp](CV)    proxy-exec-full[pct imp](CV)
 Copy     1.00 [  0.00]( 3.26)     0.98 [ -1.96]( 3.24)
Scale     1.00 [  0.00]( 1.26)     0.96 [ -3.61]( 6.41)
  Add     1.00 [  0.00]( 1.47)     0.98 [ -1.84]( 4.14)
Triad     1.00 [  0.00]( 1.77)     1.00 [  0.27]( 2.60)


==================================================================
Test          : netperf
Units         : Normalized Througput
Interpretation: Higher is better
Statistic     : AMean
==================================================================
Clients:         tip[pct imp](CV)    proxy-exec-full[pct imp](CV)
 1-clients     1.00 [  0.00]( 0.22)     0.97 [ -3.01]( 0.40)
 2-clients     1.00 [  0.00]( 0.57)     0.97 [ -3.25]( 0.45)
 4-clients     1.00 [  0.00]( 0.43)     0.97 [ -3.26]( 0.59)
 8-clients     1.00 [  0.00]( 0.27)     0.97 [ -2.83]( 0.55)
16-clients     1.00 [  0.00]( 0.46)     0.97 [ -2.99]( 0.65)
32-clients     1.00 [  0.00]( 0.95)     0.97 [ -2.98]( 0.71)
64-clients     1.00 [  0.00]( 1.79)     0.97 [ -2.61]( 1.38)
128-clients    1.00 [  0.00]( 0.89)     0.97 [ -2.72]( 0.94)
256-clients    1.00 [  0.00]( 3.88)     0.98 [ -1.89]( 2.92)
512-clients    1.00 [  0.00](35.06)     0.99 [ -0.78](47.83)


==================================================================
Test          : schbench
Units         : Normalized 99th percentile latency in us
Interpretation: Lower is better
Statistic     : Median
==================================================================
#workers: tip[pct imp](CV)    proxy-exec-full[pct imp](CV)
  1     1.00 [ -0.00](27.28)     1.31 [-31.25]( 6.45)
  2     1.00 [ -0.00]( 3.85)     0.95 [  5.00](10.02)
  4     1.00 [ -0.00](14.00)     1.11 [-10.53]( 1.36)
  8     1.00 [ -0.00]( 4.68)     1.15 [-14.58](14.55)
 16     1.00 [ -0.00]( 4.08)     0.98 [  1.61]( 3.28)
 32     1.00 [ -0.00]( 6.68)     1.02 [ -2.04]( 1.71)
 64     1.00 [ -0.00]( 1.79)     1.12 [-11.73]( 7.08)
128     1.00 [ -0.00]( 6.30)     1.11 [-10.84]( 5.52)
256     1.00 [ -0.00](43.39)     1.37 [-37.14](20.11)
512     1.00 [ -0.00]( 2.26)     0.99 [  1.17]( 1.43)


==================================================================
Test          : Unixbench
Units         : Normalized scores
Interpretation: Lower is better
Statistic     : Various (Mentioned)
==================================================================
Metric	  Variant                    tip        proxy-exec-full
Hmean     unixbench-dhry2reg-1    0.00%           -0.67%
Hmean     unixbench-dhry2reg-512  0.00%            0.14%
Amean     unixbench-syscall-1     0.00%           -0.86%
Amean     unixbench-syscall-512   0.00%           -6.42%
Hmean     unixbench-pipe-1        0.00%            0.79%
Hmean     unixbench-pipe-512      0.00%            0.57%
Hmean     unixbench-spawn-1       0.00%           -3.91%
Hmean     unixbench-spawn-512     0.00%            3.17%
Hmean     unixbench-execl-1       0.00%           -1.18%
Hmean     unixbench-execl-512     0.00%            1.26%
--

> (I'm trying to get v9 ready as soon as I can here, as its fixed a
> number of smaller issues - However, I've also managed to uncover some
> new problems in stress testing, so we'll see how quickly I can chase
> those down).

I haven't seen any splats when running the above tests. I'll test some
larger workloads next. Please let me know if you would like me to test
any specific workload or need additional data from these tests :)

> 
> thanks
> -john
 
--
Thanks and Regards,
Prateek