[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <27651e14-f441-c1e2-9b5b-b958d6aadc79@amd.com>
Date: Thu, 5 Oct 2023 11:52:13 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Chen Yu <yu.c.chen@...el.com>,
Peter Zijlstra <peterz@...radead.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Ingo Molnar <mingo@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Juri Lelli <juri.lelli@...hat.com>
Cc: Tim Chen <tim.c.chen@...el.com>, Aaron Lu <aaron.lu@...el.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Valentin Schneider <vschneid@...hat.com>,
"Gautham R . Shenoy" <gautham.shenoy@....com>,
linux-kernel@...r.kernel.org, Chen Yu <yu.chen.surf@...il.com>
Subject: Re: [PATCH 0/2] Introduce SIS_CACHE to choose previous CPU during
task wakeup
Hello Chenyu,
On 9/26/2023 10:40 AM, Chen Yu wrote:
> RFC -> v1:
> - drop RFC
> - Only record the short sleeping time for each task, to better honor the
> burst sleeping tasks. (Mathieu Desnoyers)
> - Keep the forward movement monotonic for runqueue's cache-hot timeout value.
> (Mathieu Desnoyers, Aaron Lu)
> - Introduce a new helper function cache_hot_cpu() that considers
> rq->cache_hot_timeout. (Aaron Lu)
> - Add analysis of why inhibiting task migration could bring better throughput
> for some benchmarks. (Gautham R. Shenoy)
> - Choose the first cache-hot CPU, if all idle CPUs are cache-hot in
> select_idle_cpu(). To avoid possible task stacking on the waker's CPU.
> (K Prateek Nayak)
>
> Thanks for your comments and review!
Sorry for the delay! I'll leave the test results from a 3rd Generation
EPYC system below.
tl;dr
- Small regression in tbench and netperf possible due to more searching
for an idle CPU.
- Small regression in schbench (old) at 256 workers albeit with large
run to run variance.
- Other benchmarks are more or less same.
I'll leave the full result below
o System details
- 3rd Generation EPYC System
- 2 sockets each with 64C/128T
- NPS1 (Each socket is a NUMA node)
- Boost enabled, C2 Disabled (POLL and MWAIT based C1 remained enabled)
o Kernel Details
- tip: tip:sched/core at commit 5fe7765997b1 (sched/deadline: Make
dl_rq->pushable_dl_tasks update drive dl_rq->overloaded)
- SIS_CACHE: tip + this series
o Benchmark results
==================================================================
Test : hackbench
Units : Normalized time in seconds
Interpretation: Lower is better
Statistic : AMean
==================================================================
Case: tip[pct imp](CV) SIS_CACHE[pct imp](CV)
1-groups 1.00 [ -0.00]( 2.36) 1.01 [ -1.47]( 3.02)
2-groups 1.00 [ -0.00]( 2.35) 0.99 [ 0.92]( 1.01)
4-groups 1.00 [ -0.00]( 1.79) 0.98 [ 2.34]( 0.63)
8-groups 1.00 [ -0.00]( 0.84) 0.98 [ 1.73]( 1.02)
16-groups 1.00 [ -0.00]( 2.39) 0.97 [ 2.76]( 2.33)
==================================================================
Test : tbench
Units : Normalized throughput
Interpretation: Higher is better
Statistic : AMean
==================================================================
Clients: tip[pct imp](CV) SIS_CACHE[pct imp](CV)
1 1.00 [ 0.00]( 0.86) 0.97 [ -2.68]( 0.74)
2 1.00 [ 0.00]( 0.99) 0.98 [ -2.18]( 0.17)
4 1.00 [ 0.00]( 0.49) 0.98 [ -2.47]( 1.15)
8 1.00 [ 0.00]( 0.96) 0.96 [ -3.81]( 0.24)
16 1.00 [ 0.00]( 1.38) 0.96 [ -4.33]( 1.31)
32 1.00 [ 0.00]( 1.64) 0.95 [ -4.70]( 1.59)
64 1.00 [ 0.00]( 0.92) 0.97 [ -2.97]( 0.49)
128 1.00 [ 0.00]( 0.57) 0.99 [ -1.15]( 0.57)
256 1.00 [ 0.00]( 0.38) 1.00 [ 0.03]( 0.79)
512 1.00 [ 0.00]( 0.04) 1.00 [ 0.43]( 0.34)
1024 1.00 [ 0.00]( 0.20) 1.00 [ 0.41]( 0.13)
==================================================================
Test : stream-10
Units : Normalized Bandwidth, MB/s
Interpretation: Higher is better
Statistic : HMean
==================================================================
Test: tip[pct imp](CV) SIS_CACHE[pct imp](CV)
Copy 1.00 [ 0.00]( 2.52) 0.93 [ -6.90]( 6.75)
Scale 1.00 [ 0.00]( 6.38) 0.99 [ -1.18]( 7.45)
Add 1.00 [ 0.00]( 6.54) 0.97 [ -2.55]( 7.34)
Triad 1.00 [ 0.00]( 5.18) 0.95 [ -4.64]( 6.81)
==================================================================
Test : stream-100
Units : Normalized Bandwidth, MB/s
Interpretation: Higher is better
Statistic : HMean
==================================================================
Test: tip[pct imp](CV) SIS_CACHE[pct imp](CV)
Copy 1.00 [ 0.00]( 0.74) 1.00 [ -0.20]( 1.69)
Scale 1.00 [ 0.00]( 6.25) 1.03 [ 3.46]( 0.55)
Add 1.00 [ 0.00]( 6.53) 1.05 [ 4.58]( 0.43)
Triad 1.00 [ 0.00]( 5.14) 0.98 [ -1.78]( 6.24)
==================================================================
Test : netperf
Units : Normalized Througput
Interpretation: Higher is better
Statistic : AMean
==================================================================
Clients: tip[pct imp](CV) SIS_CACHE[pct imp](CV)
1-clients 1.00 [ 0.00]( 0.27) 0.98 [ -1.50]( 0.14)
2-clients 1.00 [ 0.00]( 1.32) 0.98 [ -2.35]( 0.54)
4-clients 1.00 [ 0.00]( 0.40) 0.98 [ -2.35]( 0.56)
8-clients 1.00 [ 0.00]( 0.97) 0.97 [ -2.72]( 0.50)
16-clients 1.00 [ 0.00]( 0.54) 0.96 [ -3.92]( 0.86)
32-clients 1.00 [ 0.00]( 1.38) 0.97 [ -3.10]( 0.44)
64-clients 1.00 [ 0.00]( 1.78) 0.97 [ -3.44]( 1.70)
128-clients 1.00 [ 0.00]( 1.09) 0.94 [ -5.75]( 2.67)
256-clients 1.00 [ 0.00]( 4.45) 0.97 [ -2.61]( 4.93)
512-clients 1.00 [ 0.00](54.70) 0.98 [ -1.64](55.09)
==================================================================
Test : schbench
Units : Normalized 99th percentile latency in us
Interpretation: Lower is better
Statistic : Median
==================================================================
#workers: tip[pct imp](CV) SIS_CACHE[pct imp](CV)
1 1.00 [ -0.00]( 3.95) 0.97 [ 2.56](10.42)
2 1.00 [ -0.00]( 5.89) 0.83 [ 16.67](22.56)
4 1.00 [ -0.00](14.28) 1.00 [ -0.00](14.75)
8 1.00 [ -0.00]( 4.90) 0.84 [ 15.69]( 6.01)
16 1.00 [ -0.00]( 4.15) 1.00 [ -0.00]( 4.41)
32 1.00 [ -0.00]( 5.10) 1.01 [ -1.10]( 3.44)
64 1.00 [ -0.00]( 2.69) 1.04 [ -3.72]( 2.57)
128 1.00 [ -0.00]( 2.63) 0.94 [ 6.29]( 2.55)
256 1.00 [ -0.00](26.75) 1.51 [-50.57](11.40)
512 1.00 [ -0.00]( 2.93) 0.96 [ 3.52]( 3.56)
==================================================================
Test : ycsb-cassandra
Units : Normalized throughput
Interpretation: Higher is better
Statistic : Mean
==================================================================
Metric tip SIS_CACHE(pct imp)
Throughput 1.00 1.00 (%diff: 0.27%)
==================================================================
Test : ycsb-mondodb
Units : Normalized throughput
Interpretation: Higher is better
Statistic : Mean
==================================================================
Metric tip SIS_CACHE(pct imp)
Throughput 1.00 1.00 (%diff: -0.45%)
==================================================================
Test : DeathStarBench
Units : Normalized throughput
Interpretation: Higher is better
Statistic : Mean
==================================================================
Pinning scaling tip SIS_CACHE(pct imp)
1CCD 1 1.00 1.00 (%diff: -0.47%)
2CCD 2 1.00 0.98 (%diff: -2.34%)
4CCD 4 1.00 1.00 (%diff: -0.29%)
8CCD 8 1.00 1.01 (%diff: 0.54%)
>
> ----------------------------------------------------------------------
>
> [..snip..]
>
--
Thanks and Regards,
Prateek
Powered by blists - more mailing lists