linux-kernel - Re: [PATCH v8 00/25] timer: Move from a push remote at enqueue to a pull at expiry model

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <28563e2d-6746-e2c4-7d21-4ca39a82edc1@amd.com>
Date:   Thu, 12 Oct 2023 07:52:10 +0530
From:   K Prateek Nayak <kprateek.nayak@....com>
To:     Anna-Maria Behnsen <anna-maria@...utronix.de>,
        linux-kernel@...r.kernel.org
Cc:     Peter Zijlstra <peterz@...radead.org>,
        John Stultz <jstultz@...gle.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Eric Dumazet <edumazet@...gle.com>,
        "Rafael J . Wysocki" <rafael.j.wysocki@...el.com>,
        Arjan van de Ven <arjan@...radead.org>,
        "Paul E . McKenney" <paulmck@...nel.org>,
        Frederic Weisbecker <frederic@...nel.org>,
        Rik van Riel <riel@...riel.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Sebastian Siewior <bigeasy@...utronix.de>,
        Giovanni Gherdovich <ggherdovich@...e.cz>,
        Lukasz Luba <lukasz.luba@....com>,
        "Gautham R . Shenoy" <gautham.shenoy@....com>,
        Srinivas Pandruvada <srinivas.pandruvada@...el.com>
Subject: Re: [PATCH v8 00/25] timer: Move from a push remote at enqueue to a
 pull at expiry model

Hello Anna-Maria,

Happy to report I don't see any regression with this version of series.
I'll leave the detailed report below.

On 10/4/2023 6:04 PM, Anna-Maria Behnsen wrote:
> [..snip..]
> 
> dbench test
> ^^^^^^^^^^^
> 
> A dbench test starting X pairs of client servers are used to create load on
> the system. The measurable value is the throughput. The tests were executed
> on a zen3 machine. The base is the tip tree branch timers/core which is
> based on a v6.6-rc1.
> 
> governor menu
> 
> X pairs	timers/core	pull-model	impact
> ----------------------------------------------
> 1	353.19 (0.19)	353.45 (0.30)	0.07%
> 2	700.10 (0.96)	687.00 (0.20)	-1.87%
> 4	1329.37 (0.63)	1282.91 (0.64)	-3.49%
> 8	2561.16 (1.28)	2493.56	(1.76)	-2.64%
> 16	4959.96 (0.80)	4914.59 (0.64)	-0.91%
> 32	9741.92 (3.44)	8979.83 (1.13)	-7.82%
> 64	16535.40 (2.84)	16388.47 (4.02)	-0.89%
> 128	22136.83 (2.42)	23174.50 (1.43)	4.69%
> 256	39256.77 (4.48)	38994.00 (0.39)	-0.67%
> 512	36799.03 (1.83)	38091.10 (0.63)	3.51%
> 1024	32903.03 (0.86)	35370.70 (0.89)	7.50%
> 
> 
> governor teo
> 
> X pairs	timers/core	pull-model	impact
> ----------------------------------------------
> 1	350.83 (1.27)	352.45 (0.96)	0.46%
> 2	699.52 (0.85)	690.10 (0.54)	-1.35%
> 4	1339.53 (1.99)	1294.71 (2.71)	-3.35%
> 8	2574.10 (0.76)	2495.46 (1.97)	-3.06%
> 16	4898.50 (1.74)	4783.06 (1.64)	-2.36%
> 32	9115.50 (4.63)	9037.83 (1.58)	-0.85%
> 64	16663.90 (3.80)	16042.00 (1.72)	-3.73%
> 128	25044.93 (1.11)	23250.03 (1.08)	-7.17%
> 256	38059.53 (1.70)	39658.57 (2.98)	4.20%
> 512	36369.30 (0.39)	38890.13 (0.36)	6.93%
> 1024	33956.83 (1.14)	35514.83 (0.29)	4.59%

o Machine details

- 3rd Generation EPYC System
- 2 sockets each with 64C/128T
- NPS1 (Each socket is a NUMA node)
- C2 Disabled (POLL and C1(MWAIT) remained enabled)

o Kernel Details

- tip:	tip:sched/core at commit 238437d88cea ("intel_idle: Add ibrs_off
	module parameter to force-disable IBRS") + min_deadline fix
	commit 8dafa9d0eb1a ("sched/eevdf: Fix min_deadline heap
	integrity") from tip:sched/urgent

- timer-pull: tip + this series as is

o Benchmark Results

==================================================================
Test          : hackbench
Units         : Normalized time in seconds
Interpretation: Lower is better
Statistic     : AMean
==================================================================
Case:         tip[pct imp](CV)    timer-pull[pct imp](CV)
 1-groups     1.00 [ -0.00]( 2.11)     0.99 [  1.44]( 3.34)
 2-groups     1.00 [ -0.00]( 1.31)     1.01 [ -0.93]( 1.57)
 4-groups     1.00 [ -0.00]( 1.04)     1.00 [  0.44]( 1.11)
 8-groups     1.00 [ -0.00]( 1.34)     0.99 [  1.29]( 1.34)
16-groups     1.00 [ -0.00]( 2.45)     1.00 [ -0.40]( 2.78)


==================================================================
Test          : tbench
Units         : Normalized throughput
Interpretation: Higher is better
Statistic     : AMean
==================================================================
Clients:    tip[pct imp](CV)    timer-pull[pct imp](CV)
    1     1.00 [  0.00]( 0.46)     1.01 [  0.52]( 0.66)
    2     1.00 [  0.00]( 0.64)     0.99 [ -0.60]( 0.88)
    4     1.00 [  0.00]( 0.59)     0.99 [ -0.92]( 1.82)
    8     1.00 [  0.00]( 0.34)     1.00 [ -0.06]( 0.33)
   16     1.00 [  0.00]( 0.72)     0.99 [ -1.25]( 1.52)
   32     1.00 [  0.00]( 0.65)     0.98 [ -1.59]( 1.29)
   64     1.00 [  0.00]( 0.59)     0.99 [ -0.84]( 3.87)
  128     1.00 [  0.00]( 1.19)     1.00 [  0.11]( 0.33)
  256     1.00 [  0.00]( 0.16)     1.01 [  0.61]( 0.52)
  512     1.00 [  0.00]( 0.20)     1.01 [  0.80]( 0.29)
 1024     1.00 [  0.00]( 0.06)     1.01 [  1.06]( 0.59)


==================================================================
Test          : stream-10
Units         : Normalized Bandwidth, MB/s
Interpretation: Higher is better
Statistic     : HMean
==================================================================
Test:       tip[pct imp](CV)    timer-pull[pct imp](CV)
 Copy     1.00 [  0.00]( 6.04)     1.04 [  4.31]( 3.71)
Scale     1.00 [  0.00]( 5.44)     1.01 [  0.57]( 5.63)
  Add     1.00 [  0.00]( 5.44)     1.01 [  0.99]( 5.46)
Triad     1.00 [  0.00]( 7.82)     1.04 [  4.14]( 5.68)


==================================================================
Test          : stream-100
Units         : Normalized Bandwidth, MB/s
Interpretation: Higher is better
Statistic     : HMean
==================================================================
Test:       tip[pct imp](CV)    timer-pull[pct imp](CV)
 Copy     1.00 [  0.00]( 1.14)     1.00 [  0.29]( 0.49)
Scale     1.00 [  0.00]( 4.60)     1.03 [  2.87]( 0.62)
  Add     1.00 [  0.00]( 4.91)     1.01 [  1.36]( 1.34)
Triad     1.00 [  0.00]( 0.60)     0.98 [ -1.50]( 4.24)


==================================================================
Test          : netperf
Units         : Normalized Througput
Interpretation: Higher is better
Statistic     : AMean
==================================================================
Clients:         tip[pct imp](CV)    timer-pull[pct imp](CV)
 1-clients     1.00 [  0.00]( 0.61)     1.01 [  1.25]( 0.48)
 2-clients     1.00 [  0.00]( 0.44)     1.00 [  0.34]( 0.65)
 4-clients     1.00 [  0.00]( 0.75)     1.01 [  0.98]( 1.26)
 8-clients     1.00 [  0.00]( 0.65)     1.01 [  0.82]( 0.73)
16-clients     1.00 [  0.00]( 0.49)     1.00 [  0.37]( 0.99)
32-clients     1.00 [  0.00]( 0.57)     0.98 [ -2.05]( 3.44)
64-clients     1.00 [  0.00]( 1.67)     1.00 [  0.00]( 1.74)
128-clients    1.00 [  0.00]( 1.11)     1.01 [  0.69]( 1.11)
256-clients    1.00 [  0.00]( 2.64)     1.00 [  0.00]( 3.79)
512-clients    1.00 [  0.00](52.49)     1.00 [  0.26](54.13)


==================================================================
Test          : schbench
Units         : Normalized 99th percentile latency in us
Interpretation: Lower is better
Statistic     : Median
==================================================================
#workers: tip[pct imp](CV)    timer-pull[pct imp](CV)
  1     1.00 [ -0.00]( 8.41)     0.59 [ 40.54](40.25)
  2     1.00 [ -0.00]( 5.29)     0.93 [  7.50]( 9.01)
  4     1.00 [ -0.00]( 1.32)     0.91 [  9.09](12.33)
  8     1.00 [ -0.00]( 9.52)     1.00 [ -0.00](15.02)
 16     1.00 [ -0.00]( 1.61)     1.03 [ -3.23]( 2.37)
 32     1.00 [ -0.00]( 7.27)     0.92 [  7.69]( 1.59)
 64     1.00 [ -0.00]( 6.96)     1.12 [-11.56]( 1.20)
128     1.00 [ -0.00]( 3.41)     1.06 [ -6.49]( 3.73)
256     1.00 [ -0.00](32.95)     1.02 [ -2.48](28.66)
512     1.00 [ -0.00]( 3.20)     0.99 [  0.71]( 3.22)


==================================================================
Test          : ycsb-cassandra
Units         : Normalized throughput
Interpretation: Higher is better
Statistic     : Mean
==================================================================
metric      tip     timer-pull (%diff)
throughput  1.00    1.01 (%diff: 0.75%)


==================================================================
Test          : ycsb-mondodb
Units         : Normalized throughput
Interpretation: Higher is better
Statistic     : Mean
==================================================================
metric      tip     timer-pull (%diff)
throughput  1.00    1.00 (%diff: -0.49%)


==================================================================
Test          : DeathStarBench
Units         : Normalized throughput
Interpretation: Higher is better
Statistic     : Mean
==================================================================
Pinning   scaling   tip     timer-pull (%diff)
1CCD        1       1.00    1.01 (%diff: 0.75%)
2CCD        2       1.00    1.03 (%diff: 2.72%)
4CCD        4       1.00    1.00 (%diff: -0.28%)
8CCD        8       1.00    1.00 (%diff: 0.20%)

--

Thank you for debugging and helping fix the tbench regression.
If the series does not change drastically, feel free to add:

Tested-by: K Prateek Nayak <kprateek.nayak@....com>

> 
> 
> 
> Ping Pong Oberservation
> ^^^^^^^^^^^^^^^^^^^^^^^
> 
> During testing on a mostly idle machine a ping pong game could be observed:
> a process_timeout timer is expired remotely on a non idle CPU. Then the CPU
> where the schedule_timeout() was executed to enqueue the timer comes out of
> idle and restarts the timer using schedule_timeout() and goes back to idle
> again. This is due to the fair scheduler which tries to keep the task on
> the CPU which it previously executed on.
> 
> 
> 
> 
> Possible Next Steps
> ~~~~~~~~~~~~~~~~~~~
> 
> Simple deferrable timers are no longer required as they can be converted to
> global timers. If a CPU goes idle, a formerly deferrable timer will not
> prevent the CPU to sleep as long as possible. Only the last migrator CPU
> has to take care of them. Deferrable timers with timer pinned flags needs
> to be expired on the specified CPU but must not prevent CPU from going
> idle. They require their own timer base which is never taken into account
> when calculating the next expiry time. This conversation and required
> cleanup will be done in a follow up series.
> 

I'll keep an eye out for future versions for testing.

> 
> [..snip..]
> 

--
Thanks and Regards,
Prateek