linux-kernel - Re: [PATCH 0/2] Introduce SIS_CACHE to choose previous CPU during task wakeup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <27651e14-f441-c1e2-9b5b-b958d6aadc79@amd.com>
Date:   Thu, 5 Oct 2023 11:52:13 +0530
From:   K Prateek Nayak <kprateek.nayak@....com>
To:     Chen Yu <yu.c.chen@...el.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Ingo Molnar <mingo@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Juri Lelli <juri.lelli@...hat.com>
Cc:     Tim Chen <tim.c.chen@...el.com>, Aaron Lu <aaron.lu@...el.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Valentin Schneider <vschneid@...hat.com>,
        "Gautham R . Shenoy" <gautham.shenoy@....com>,
        linux-kernel@...r.kernel.org, Chen Yu <yu.chen.surf@...il.com>
Subject: Re: [PATCH 0/2] Introduce SIS_CACHE to choose previous CPU during
 task wakeup

Hello Chenyu,

On 9/26/2023 10:40 AM, Chen Yu wrote:
> RFC -> v1:
> - drop RFC
> - Only record the short sleeping time for each task, to better honor the
>   burst sleeping tasks. (Mathieu Desnoyers)
> - Keep the forward movement monotonic for runqueue's cache-hot timeout value.
>   (Mathieu Desnoyers, Aaron Lu)
> - Introduce a new helper function cache_hot_cpu() that considers
>   rq->cache_hot_timeout. (Aaron Lu)
> - Add analysis of why inhibiting task migration could bring better throughput
>   for some benchmarks. (Gautham R. Shenoy)
> - Choose the first cache-hot CPU, if all idle CPUs are cache-hot in
>   select_idle_cpu(). To avoid possible task stacking on the waker's CPU.
>   (K Prateek Nayak)
> 
> Thanks for your comments and review!

Sorry for the delay! I'll leave the test results from a 3rd Generation
EPYC system below.

tl;dr

- Small regression in tbench and netperf possible due to more searching
  for an idle CPU.

- Small regression in schbench (old) at 256 workers albeit with large
  run to run variance.

- Other benchmarks are more or less same.

I'll leave the full result below

o System details

- 3rd Generation EPYC System
- 2 sockets each with 64C/128T
- NPS1 (Each socket is a NUMA node)
- Boost enabled, C2 Disabled (POLL and MWAIT based C1 remained enabled)


o Kernel Details

- tip:	tip:sched/core at commit 5fe7765997b1 (sched/deadline: Make
	dl_rq->pushable_dl_tasks update drive dl_rq->overloaded)

- SIS_CACHE: tip + this series


o Benchmark results

==================================================================
Test          : hackbench
Units         : Normalized time in seconds
Interpretation: Lower is better
Statistic     : AMean
==================================================================
Case:           tip[pct imp](CV)     SIS_CACHE[pct imp](CV)
 1-groups     1.00 [ -0.00]( 2.36)     1.01 [ -1.47]( 3.02)
 2-groups     1.00 [ -0.00]( 2.35)     0.99 [  0.92]( 1.01)
 4-groups     1.00 [ -0.00]( 1.79)     0.98 [  2.34]( 0.63)
 8-groups     1.00 [ -0.00]( 0.84)     0.98 [  1.73]( 1.02)
16-groups     1.00 [ -0.00]( 2.39)     0.97 [  2.76]( 2.33)


==================================================================
Test          : tbench
Units         : Normalized throughput
Interpretation: Higher is better
Statistic     : AMean
==================================================================
Clients:    tip[pct imp](CV)      SIS_CACHE[pct imp](CV)
    1     1.00 [  0.00]( 0.86)     0.97 [ -2.68]( 0.74)
    2     1.00 [  0.00]( 0.99)     0.98 [ -2.18]( 0.17)
    4     1.00 [  0.00]( 0.49)     0.98 [ -2.47]( 1.15)
    8     1.00 [  0.00]( 0.96)     0.96 [ -3.81]( 0.24)
   16     1.00 [  0.00]( 1.38)     0.96 [ -4.33]( 1.31)
   32     1.00 [  0.00]( 1.64)     0.95 [ -4.70]( 1.59)
   64     1.00 [  0.00]( 0.92)     0.97 [ -2.97]( 0.49)
  128     1.00 [  0.00]( 0.57)     0.99 [ -1.15]( 0.57)
  256     1.00 [  0.00]( 0.38)     1.00 [  0.03]( 0.79)
  512     1.00 [  0.00]( 0.04)     1.00 [  0.43]( 0.34)
 1024     1.00 [  0.00]( 0.20)     1.00 [  0.41]( 0.13)


==================================================================
Test          : stream-10
Units         : Normalized Bandwidth, MB/s
Interpretation: Higher is better
Statistic     : HMean
==================================================================
Test:       tip[pct imp](CV)      SIS_CACHE[pct imp](CV)
 Copy     1.00 [  0.00]( 2.52)     0.93 [ -6.90]( 6.75)
Scale     1.00 [  0.00]( 6.38)     0.99 [ -1.18]( 7.45)
  Add     1.00 [  0.00]( 6.54)     0.97 [ -2.55]( 7.34)
Triad     1.00 [  0.00]( 5.18)     0.95 [ -4.64]( 6.81)


==================================================================
Test          : stream-100
Units         : Normalized Bandwidth, MB/s
Interpretation: Higher is better
Statistic     : HMean
==================================================================
Test:       tip[pct imp](CV)      SIS_CACHE[pct imp](CV)
 Copy     1.00 [  0.00]( 0.74)     1.00 [ -0.20]( 1.69)
Scale     1.00 [  0.00]( 6.25)     1.03 [  3.46]( 0.55)
  Add     1.00 [  0.00]( 6.53)     1.05 [  4.58]( 0.43)
Triad     1.00 [  0.00]( 5.14)     0.98 [ -1.78]( 6.24)


==================================================================
Test          : netperf
Units         : Normalized Througput
Interpretation: Higher is better
Statistic     : AMean
==================================================================
Clients:         tip[pct imp](CV)      SIS_CACHE[pct imp](CV)
 1-clients     1.00 [  0.00]( 0.27)     0.98 [ -1.50]( 0.14)
 2-clients     1.00 [  0.00]( 1.32)     0.98 [ -2.35]( 0.54)
 4-clients     1.00 [  0.00]( 0.40)     0.98 [ -2.35]( 0.56)
 8-clients     1.00 [  0.00]( 0.97)     0.97 [ -2.72]( 0.50)
16-clients     1.00 [  0.00]( 0.54)     0.96 [ -3.92]( 0.86)
32-clients     1.00 [  0.00]( 1.38)     0.97 [ -3.10]( 0.44)
64-clients     1.00 [  0.00]( 1.78)     0.97 [ -3.44]( 1.70)
128-clients    1.00 [  0.00]( 1.09)     0.94 [ -5.75]( 2.67)
256-clients    1.00 [  0.00]( 4.45)     0.97 [ -2.61]( 4.93)
512-clients    1.00 [  0.00](54.70)     0.98 [ -1.64](55.09)


==================================================================
Test          : schbench
Units         : Normalized 99th percentile latency in us
Interpretation: Lower is better
Statistic     : Median
==================================================================
#workers:  tip[pct imp](CV)     SIS_CACHE[pct imp](CV)
  1     1.00 [ -0.00]( 3.95)     0.97 [  2.56](10.42)
  2     1.00 [ -0.00]( 5.89)     0.83 [ 16.67](22.56)
  4     1.00 [ -0.00](14.28)     1.00 [ -0.00](14.75)
  8     1.00 [ -0.00]( 4.90)     0.84 [ 15.69]( 6.01)
 16     1.00 [ -0.00]( 4.15)     1.00 [ -0.00]( 4.41)
 32     1.00 [ -0.00]( 5.10)     1.01 [ -1.10]( 3.44)
 64     1.00 [ -0.00]( 2.69)     1.04 [ -3.72]( 2.57)
128     1.00 [ -0.00]( 2.63)     0.94 [  6.29]( 2.55)
256     1.00 [ -0.00](26.75)     1.51 [-50.57](11.40)
512     1.00 [ -0.00]( 2.93)     0.96 [  3.52]( 3.56)

==================================================================
Test          : ycsb-cassandra
Units         : Normalized throughput
Interpretation: Higher is better
Statistic     : Mean
==================================================================
Metric          tip     SIS_CACHE(pct imp)
Throughput      1.00    1.00 (%diff: 0.27%)


==================================================================
Test          : ycsb-mondodb
Units         : Normalized throughput
Interpretation: Higher is better
Statistic     : Mean
==================================================================
Metric          tip      SIS_CACHE(pct imp)
Throughput      1.00    1.00 (%diff: -0.45%)


==================================================================
Test          : DeathStarBench
Units         : Normalized throughput
Interpretation: Higher is better
Statistic     : Mean
==================================================================
Pinning      scaling     tip                SIS_CACHE(pct imp)
 1CCD           1        1.00              1.00 (%diff: -0.47%)
 2CCD           2        1.00              0.98 (%diff: -2.34%)
 4CCD           4        1.00              1.00 (%diff: -0.29%)
 8CCD           8        1.00              1.01 (%diff: 0.54%)

> 
> ----------------------------------------------------------------------
> 
> [..snip..]
> 

--
Thanks and Regards,
Prateek