lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f6dc1652-bc39-0b12-4b6b-29a2f9cd8484@amd.com>
Date:   Fri, 25 Aug 2023 15:41:08 +0530
From:   Swapnil Sapkal <Swapnil.Sapkal@....com>
To:     Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Peter Zijlstra <peterz@...radead.org>
Cc:     linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
        Valentin Schneider <vschneid@...hat.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Aaron Lu <aaron.lu@...el.com>,
        Julien Desfossez <jdesfossez@...italocean.com>, x86@...nel.org
Subject: Re: [RFC PATCH v3 0/3] sched: Skip queued wakeups only when L2 is
 shared

Hello Mathieu,

On 8/22/2023 5:01 PM, Mathieu Desnoyers wrote:
> This series improves performance of scheduler wakeups on large systems
> by skipping queued wakeups only when CPUs share their L2 cache, rather
> than when they share their LLC.
> 
> The speedup mainly reproduces on workloads which have at least *some*
> idle time (because it significantly increases the number of migrations,
> and thus remote wakeups), *and* it needs to have a sufficient load to
> cause contention on the runqueue locks.
> 
> Feedback is welcome,

I ran some micro-benchmarks as part of testing this series. Here are the
observations:

- Hackbench shows improvement with this patch and Aaron's patch with
   6.5-rc1 kernel as the baseline.

- tbench and netperf shows shows some dip in performance with highly
   overloaded case.

- Other micro-benchmarks shows more or less similar performance with
   these patches.

o System Details

- 4th Generation EPYC System
- 2 x 128C/256T
- NPS1 mode

o Kernels

base:	                                6.5.0-rc1
base + mathieu-queued-wakeup:		6.5.0-rc1 + Mathieu's patches [1]
base + aaron-tg-load-avg: 		6.5.0-rc1 + Aaron's patch [2]
base + queued-wakeup + tg-load-avg:     6.5.0-rc1 + Mathieu's patches [1] + Aaron's patch [2]

[References]

[1] "sched: Skip queued wakeups only when L2 is shared"
     (https://lore.kernel.org/all/20230822113133.643238-1-mathieu.desnoyers@efficios.com/)
[2] "Reduce cost of accessing tg->load_avg"
     (https://lore.kernel.org/lkml/20230823060832.454842-1-aaron.lu@intel.com/)

==================================================================
Test          : hackbench
Units         : Time in seconds
Interpretation: Lower is better
Statistic     : AMean
==================================================================
Test:        6.5.0-rc1 (base)    base + mathieu-queued-wakeup       base + aaron-tg-load-avg        base + queued-wakeup + tg-load-avg
  1-groups:   22.15 (0.00 pct)      22.46 (-1.39 pct)                    22.35 (-0.90 pct)                   21.20 (4.28 pct)
  2-groups:   22.76 (0.00 pct)      21.78 (4.30 pct)                     22.60 (0.70 pct)                    21.90 (3.77 pct)
  4-groups:   22.12 (0.00 pct)      22.02 (0.45 pct)                     22.22 (-0.45 pct)                   21.94 (0.81 pct)
  8-groups:   24.80 (0.00 pct)      22.36 (9.83 pct)                     22.99 (7.29 pct)                    22.00 (11.29 pct)
16-groups:   31.09 (0.00 pct)      21.56 (30.65 pct)                    22.13 (28.81 pct)                   20.60 (33.74 pct)

==================================================================
Test          : tbench
Units         : Throughput
Interpretation: Higher is better
Statistic     : AMean
==================================================================
Clients: 6.5.0-rc1 (base)    base + mathieu-queued-wakeup           base + aaron-tg-load-avg       base + queued-wakeup + tg-load-avg
     1    261.49 (0.00 pct)       261.18 (-0.11 pct)                     262.29 (0.30 pct)                   257.80 (-1.41 pct)
     2    514.08 (0.00 pct)       521.30 (1.40 pct)                      517.66 (0.69 pct)                   510.96 (-0.60 pct)
     4    1002.51 (0.00 pct)      988.81 (-1.36 pct)                     995.04 (-0.74 pct)                  987.74 (-1.47 pct)
     8    1978.74 (0.00 pct)      1966.60 (-0.61 pct)                    1991.85 (0.66 pct)                  1941.39 (-1.88 pct)
    16    3864.14 (0.00 pct)      3952.03 (2.27 pct)                     3914.80 (1.31 pct)                  3873.88 (0.25 pct)
    32    7473.19 (0.00 pct)      7602.38 (1.72 pct)                     7585.94 (1.50 pct)                  7423.44 (-0.66 pct)
    64    14335.10 (0.00 pct)     14313.17 (-0.15 pct)                   14474.67 (0.97 pct)                 14030.63 (-2.12 pct)
   128    27275.73 (0.00 pct)     25176.80 (-7.69 pct)                   28066.53 (2.89 pct)                 25045.53 (-8.17 pct)
   256    41688.17 (0.00 pct)     44373.40 (6.44 pct)                    43779.37 (5.01 pct)                 41427.00 (-0.62 pct)
   512    137481.33 (0.00 pct)    136466.67 (-0.73 pct)                  134824.00 (-1.93 pct)               141280.00 (2.76 pct)
  1024    140534.00 (0.00 pct)    141916.33 (0.98 pct)                   137008.33 (-2.50 pct)               126319.33 (-10.11 pct)
  2048    145378.00 (0.00 pct)    145479.33 (0.06 pct)                   138763.67 (-4.54 pct)               124471.00 (-14.38 pct)

  ==================================================================
  Test          : netperf
  Units         : Througput
  Interpretation: Higher is better
  Statistic     : AMean
  ==================================================================
                  6.5.0-rc1 (base)    base + mathieu-queued-wakeup       base + aaron-tg-load-avg        base + queued-wakeup + tg-load-avg
   1-clients:      59642.88 (0.00 pct)        61647.37 (3.36 pct)         61186.24 (2.58 pct)                 59099.11 (-0.91 pct)
   2-clients:      59349.65 (0.00 pct)        60896.01 (2.60 pct)         60582.49 (2.07 pct)                 62738.47 (5.70 pct)
   4-clients:      59197.37 (0.00 pct)        60457.29 (2.12 pct)         63042.52 (6.49 pct)                 60879.58 (2.84 pct)
   8-clients:      61977.66 (0.00 pct)        60389.92 (-2.56 pct)        62078.15 (0.16 pct)                 60314.65 (-2.68 pct)
  16-clients:      61518.83 (0.00 pct)        61143.51 (-0.61 pct)        60946.08 (-0.93 pct)                59388.78 (-3.46 pct)
  32-clients:      58230.81 (0.00 pct)        58653.20 (0.72 pct)         58594.14 (0.62 pct)                 58188.52 (-0.07 pct)
  64-clients:      58050.92 (0.00 pct)        57834.55 (-0.37 pct)        58183.51 (0.22 pct)                 57565.75 (-0.83 pct)
  128-clients:     54324.55 (0.00 pct)        54385.60 (0.11 pct)         54913.43 (1.08 pct)                 53917.11 (-0.75 pct)
  256-clients:     70155.29 (0.00 pct)        69390.68 (-1.08 pct)        70097.50 (-0.08 pct)                64410.66 (-8.18 pct)
  512-clients:     61511.77 (0.00 pct)        61480.99 (-0.05 pct)        54493.82 (-11.40 pct)               46227.05 (-24.84 pct)

==================================================================
Test          : stream-10
Units         : Bandwidth, MB/s
Interpretation: Higher is better
Statistic     : HMean
==================================================================
Test:      6.5.0-rc1 (base)      base + mathieu-queued-wakeup         base + aaron-tg-load-avg       base + queued-wakeup + tg-load-avg
  Copy:   353336.76 (0.00 pct)       352956.36 (-0.10 pct)               349583.67 (-1.06 pct)               351152.80 (-0.61 pct)
Scale:   353474.88 (0.00 pct)       354582.35 (0.31 pct)                350543.75 (-0.82 pct)               353275.74 (-0.05 pct)
   Add:   371984.24 (0.00 pct)       372824.87 (0.22 pct)                369173.72 (-0.75 pct)               370483.63 (-0.40 pct)
Triad:   372625.41 (0.00 pct)       278389.62 (-25.28 pct)              369504.06 (-0.83 pct)               369070.11 (-0.95 pct)

==================================================================
Test          : stream-100
Units         : Bandwidth, MB/s
Interpretation: Higher is better
Statistic     : HMean
==================================================================
Test:     6.5.0-rc1 (base)        base + mathieu-queued-wakeup       base + aaron-tg-load-avg       base + queued-wakeup + tg-load-avg
  Copy:   353476.35 (0.00 pct)       354954.50 (0.41 pct)                354614.56 (0.32 pct)                353512.71 (0.01 pct)
Scale:   353214.73 (0.00 pct)       354884.12 (0.47 pct)                355841.17 (0.74 pct)                353220.53 (0.00 pct)
   Add:   370755.48 (0.00 pct)       372292.72 (0.41 pct)                375307.35 (1.22 pct)                369917.77 (-0.22 pct)
Triad:   370652.02 (0.00 pct)       372732.11 (0.56 pct)                375718.85 (1.36 pct)                369926.26 (-0.19 pct)

==================================================================
Test          : schbench (old)
Units         : 99th percentile latency in us
Interpretation: Lower is better
Statistic     : Median
==================================================================
#workers: 6.5.0-rc1 (base)    base + mathieu-queued-wakeup          base + aaron-tg-load-avg        base + queued-wakeup + tg-load-avg
   1:      56.00 (0.00 pct)        58.00 (-3.57 pct)                      60.00 (-7.14 pct)                   60.00 (-7.14 pct)
   2:      61.00 (0.00 pct)        56.00 (8.19 pct)                       59.00 (3.27 pct)                    60.00 (1.63 pct)
   4:      64.00 (0.00 pct)        62.00 (3.12 pct)                       66.00 (-3.12 pct)                   64.00 (0.00 pct)
   8:      96.00 (0.00 pct)        78.00 (18.75 pct)                      76.00 (20.83 pct)                   93.00 (3.12 pct)
  16:      98.00 (0.00 pct)        95.00 (3.06 pct)                       98.00 (0.00 pct)                    95.00 (3.06 pct)
  32:     137.00 (0.00 pct)       144.00 (-5.10 pct)                     133.00 (2.91 pct)                   130.00 (5.10 pct)
  64:     206.00 (0.00 pct)       210.00 (-1.94 pct)                     200.00 (2.91 pct)                   217.00 (-5.33 pct)
128:     348.00 (0.00 pct)       347.00 (0.28 pct)                      413.00 (-18.67 pct)                 366.00 (-5.17 pct)
256:     679.00 (0.00 pct)       669.00 (1.47 pct)                      669.00 (1.47 pct)                   675.00 (0.58 pct)
512:     1366.00 (0.00 pct)      1366.00 (0.00 pct)                     1442.00 (-5.56 pct)                 1430.00 (-4.68 pct)


==================================================================
Test          : schbench (new)
Units         : 99th percentile latency in us
Interpretation: Lower is better
Statistic     : Median
==================================================================
Metric: wakeup_lat_summary
#workers: 6.5.0-rc1 (base)    base + mathieu-queued-wakeup          base + aaron-tg-load-avg        base + queued-wakeup + tg-load-avg
   1:      15.00 (0.00 pct)        15.00 (0.00 pct)                       16.00 (-6.66 pct)                   17.00 (-13.33 pct)
   2:      16.00 (0.00 pct)        16.00 (0.00 pct)                       17.00 (-6.25 pct)                   17.00 (-6.25 pct)
   4:      17.00 (0.00 pct)        17.00 (0.00 pct)                       15.00 (11.76 pct)                   17.00 (0.00 pct)
   8:      11.00 (0.00 pct)        13.00 (-18.18 pct)                     11.00 (0.00 pct)                    11.00 (0.00 pct)
  16:      11.00 (0.00 pct)        11.00 (0.00 pct)                       10.00 (9.09 pct)                     9.00 (18.18 pct)
  32:      11.00 (0.00 pct)        11.00 (0.00 pct)                       11.00 (0.00 pct)                    11.00 (0.00 pct)
  64:      10.00 (0.00 pct)        11.00 (-10.00 pct)                     10.00 (0.00 pct)                    10.00 (0.00 pct)
128:      11.00 (0.00 pct)        12.00 (-9.09 pct)                      12.00 (-9.09 pct)                   11.00 (0.00 pct)
256:     117.00 (0.00 pct)       162.00 (-38.46 pct)                     90.00 (23.07 pct)                  103.00 (11.96 pct)
512:     22496.00 (0.00 pct)     21664.00 (3.69 pct)                    22368.00 (0.56 pct)                 21408.00 (4.83 pct)

Metric: request_lat_summary
#workers: 6.5.0-rc1 (base)    base + mathieu-queued-wakeup          base + aaron-tg-load-avg        base + queued-wakeup + tg-load-avg
   1:     6872.00 (0.00 pct)      6872.00 (0.00 pct)                     6792.00 (1.16 pct)                  6856.00 (0.23 pct)
   2:     6824.00 (0.00 pct)      6824.00 (0.00 pct)                     6872.00 (-0.70 pct)                 6856.00 (-0.46 pct)
   4:     6824.00 (0.00 pct)      6808.00 (0.23 pct)                     6872.00 (-0.70 pct)                 6824.00 (0.00 pct)
   8:     6824.00 (0.00 pct)      6824.00 (0.00 pct)                     6872.00 (-0.70 pct)                 6824.00 (0.00 pct)
  16:     6824.00 (0.00 pct)      6840.00 (-0.23 pct)                    6872.00 (-0.70 pct)                 6840.00 (-0.23 pct)
  32:     6840.00 (0.00 pct)      6840.00 (0.00 pct)                     6888.00 (-0.70 pct)                 6856.00 (-0.23 pct)
  64:     6840.00 (0.00 pct)      6872.00 (-0.46 pct)                    6888.00 (-0.70 pct)                 6872.00 (-0.46 pct)
128:     12272.00 (0.00 pct)     12784.00 (-4.17 pct)                   13200.00 (-7.56 pct)                12016.00 (2.08 pct)
256:     13328.00 (0.00 pct)     13392.00 (-0.48 pct)                   13712.00 (-2.88 pct)                13552.00 (-1.68 pct)
512:     88832.00 (0.00 pct)     86400.00 (2.73 pct)                    88192.00 (0.72 pct)                 85632.00 (3.60 pct)

Metric: rps_summary
#workers: 6.5.0-rc1 (base)    base + mathieu-queued-wakeup          base + aaron-tg-load-avg       base + queued-wakeup + tg-load-avg
   1:     297.00 (0.00 pct)       297.00 (0.00 pct)                      297.00 (0.00 pct)                   299.00 (-0.67 pct)
   2:     601.00 (0.00 pct)       603.00 (-0.33 pct)                     595.00 (0.99 pct)                   601.00 (0.00 pct)
   4:     1206.00 (0.00 pct)      1206.00 (0.00 pct)                     1190.00 (1.32 pct)                  1206.00 (0.00 pct)
   8:     2412.00 (0.00 pct)      2412.00 (0.00 pct)                     2396.00 (0.66 pct)                  2420.00 (-0.33 pct)
  16:     4840.00 (0.00 pct)      4824.00 (0.33 pct)                     4792.00 (0.99 pct)                  4840.00 (0.00 pct)
  32:     9648.00 (0.00 pct)      9648.00 (0.00 pct)                     9584.00 (0.66 pct)                  9680.00 (-0.33 pct)
  64:     19360.00 (0.00 pct)     19296.00 (0.33 pct)                    19168.00 (0.99 pct)                 19296.00 (0.33 pct)
128:     37952.00 (0.00 pct)     35264.00 (7.08 pct)                    36672.00 (3.37 pct)                 38080.00 (-0.33 pct)
256:     41408.00 (0.00 pct)     41536.00 (-0.30 pct)                   39744.00 (4.01 pct)                 40896.00 (1.23 pct)
512:     36288.00 (0.00 pct)     36800.00 (-1.41 pct)                   35264.00 (2.82 pct)                 35776.00 (1.41 pct)

Tested-by: Swapnil Sapkal <Swapnil.Sapkal@....com>

> 
> Thanks,
> 
> Mathieu
> 
> Mathieu Desnoyers (3):
>    sched: Rename cpus_share_cache to cpus_share_llc
>    sched: Introduce cpus_share_l2c (v3)
>    sched: ttwu_queue_cond: skip queued wakeups across different l2 caches
> 
> Cc: Ingo Molnar <mingo@...hat.com>
> Cc: Peter Zijlstra <peterz@...radead.org>
> Cc: Valentin Schneider <vschneid@...hat.com>
> Cc: Steven Rostedt <rostedt@...dmis.org>
> Cc: Ben Segall <bsegall@...gle.com>
> Cc: Mel Gorman <mgorman@...e.de>
> Cc: Daniel Bristot de Oliveira <bristot@...hat.com>
> Cc: Vincent Guittot <vincent.guittot@...aro.org>
> Cc: Juri Lelli <juri.lelli@...hat.com>
> Cc: Swapnil Sapkal <Swapnil.Sapkal@....com>
> Cc: Aaron Lu <aaron.lu@...el.com>
> Cc: Julien Desfossez <jdesfossez@...italocean.com>
> Cc: x86@...nel.org
> 
>   block/blk-mq.c                 |  2 +-
>   include/linux/sched/topology.h | 10 ++++++++--
>   kernel/sched/core.c            | 14 +++++++++++---
>   kernel/sched/fair.c            |  8 ++++----
>   kernel/sched/sched.h           |  2 ++
>   kernel/sched/topology.c        | 32 +++++++++++++++++++++++++++++---
>   6 files changed, 55 insertions(+), 13 deletions(-)
> 
--
Thanks and Regards,
Swapnil

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ