lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5effb4de-b3d1-cffe-938e-4bdd1cc64b44@efficios.com>
Date:   Fri, 25 Aug 2023 10:03:54 -0400
From:   Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:     Swapnil Sapkal <Swapnil.Sapkal@....com>,
        Peter Zijlstra <peterz@...radead.org>
Cc:     linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
        Valentin Schneider <vschneid@...hat.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Aaron Lu <aaron.lu@...el.com>,
        Julien Desfossez <jdesfossez@...italocean.com>, x86@...nel.org
Subject: Re: [RFC PATCH v3 0/3] sched: Skip queued wakeups only when L2 is
 shared

On 8/25/23 06:11, Swapnil Sapkal wrote:
> Hello Mathieu,
> 
> On 8/22/2023 5:01 PM, Mathieu Desnoyers wrote:
>> This series improves performance of scheduler wakeups on large systems
>> by skipping queued wakeups only when CPUs share their L2 cache, rather
>> than when they share their LLC.
>>
>> The speedup mainly reproduces on workloads which have at least *some*
>> idle time (because it significantly increases the number of migrations,
>> and thus remote wakeups), *and* it needs to have a sufficient load to
>> cause contention on the runqueue locks.
>>
>> Feedback is welcome,
> 
> I ran some micro-benchmarks as part of testing this series. Here are the
> observations:
> 
> - Hackbench shows improvement with this patch and Aaron's patch with
>    6.5-rc1 kernel as the baseline.
> 
> - tbench and netperf shows shows some dip in performance with highly
>    overloaded case.
> 
> - Other micro-benchmarks shows more or less similar performance with
>    these patches.

Those results look promising! Thanks for testing!

Mathieu


> 
> o System Details
> 
> - 4th Generation EPYC System
> - 2 x 128C/256T
> - NPS1 mode
> 
> o Kernels
> 
> base:                                    6.5.0-rc1
> base + mathieu-queued-wakeup:        6.5.0-rc1 + Mathieu's patches [1]
> base + aaron-tg-load-avg:         6.5.0-rc1 + Aaron's patch [2]
> base + queued-wakeup + tg-load-avg:     6.5.0-rc1 + Mathieu's patches 
> [1] + Aaron's patch [2]
> 
> [References]
> 
> [1] "sched: Skip queued wakeups only when L2 is shared"
>      
> (https://lore.kernel.org/all/20230822113133.643238-1-mathieu.desnoyers@efficios.com/)
> [2] "Reduce cost of accessing tg->load_avg"
>      
> (https://lore.kernel.org/lkml/20230823060832.454842-1-aaron.lu@intel.com/)
> 
> ==================================================================
> Test          : hackbench
> Units         : Time in seconds
> Interpretation: Lower is better
> Statistic     : AMean
> ==================================================================
> Test:        6.5.0-rc1 (base)    base + mathieu-queued-wakeup       base 
> + aaron-tg-load-avg        base + queued-wakeup + tg-load-avg
>   1-groups:   22.15 (0.00 pct)      22.46 (-1.39 pct)                    
> 22.35 (-0.90 pct)                   21.20 (4.28 pct)
>   2-groups:   22.76 (0.00 pct)      21.78 (4.30 pct)                     
> 22.60 (0.70 pct)                    21.90 (3.77 pct)
>   4-groups:   22.12 (0.00 pct)      22.02 (0.45 pct)                     
> 22.22 (-0.45 pct)                   21.94 (0.81 pct)
>   8-groups:   24.80 (0.00 pct)      22.36 (9.83 pct)                     
> 22.99 (7.29 pct)                    22.00 (11.29 pct)
> 16-groups:   31.09 (0.00 pct)      21.56 (30.65 pct)                    
> 22.13 (28.81 pct)                   20.60 (33.74 pct)
> 
> ==================================================================
> Test          : tbench
> Units         : Throughput
> Interpretation: Higher is better
> Statistic     : AMean
> ==================================================================
> Clients: 6.5.0-rc1 (base)    base + mathieu-queued-wakeup           base 
> + aaron-tg-load-avg       base + queued-wakeup + tg-load-avg
>      1    261.49 (0.00 pct)       261.18 (-0.11 pct)                     
> 262.29 (0.30 pct)                   257.80 (-1.41 pct)
>      2    514.08 (0.00 pct)       521.30 (1.40 pct)                      
> 517.66 (0.69 pct)                   510.96 (-0.60 pct)
>      4    1002.51 (0.00 pct)      988.81 (-1.36 pct)                     
> 995.04 (-0.74 pct)                  987.74 (-1.47 pct)
>      8    1978.74 (0.00 pct)      1966.60 (-0.61 pct)                    
> 1991.85 (0.66 pct)                  1941.39 (-1.88 pct)
>     16    3864.14 (0.00 pct)      3952.03 (2.27 pct)                     
> 3914.80 (1.31 pct)                  3873.88 (0.25 pct)
>     32    7473.19 (0.00 pct)      7602.38 (1.72 pct)                     
> 7585.94 (1.50 pct)                  7423.44 (-0.66 pct)
>     64    14335.10 (0.00 pct)     14313.17 (-0.15 pct)                   
> 14474.67 (0.97 pct)                 14030.63 (-2.12 pct)
>    128    27275.73 (0.00 pct)     25176.80 (-7.69 pct)                   
> 28066.53 (2.89 pct)                 25045.53 (-8.17 pct)
>    256    41688.17 (0.00 pct)     44373.40 (6.44 pct)                    
> 43779.37 (5.01 pct)                 41427.00 (-0.62 pct)
>    512    137481.33 (0.00 pct)    136466.67 (-0.73 pct)                  
> 134824.00 (-1.93 pct)               141280.00 (2.76 pct)
>   1024    140534.00 (0.00 pct)    141916.33 (0.98 pct)                   
> 137008.33 (-2.50 pct)               126319.33 (-10.11 pct)
>   2048    145378.00 (0.00 pct)    145479.33 (0.06 pct)                   
> 138763.67 (-4.54 pct)               124471.00 (-14.38 pct)
> 
>   ==================================================================
>   Test          : netperf
>   Units         : Througput
>   Interpretation: Higher is better
>   Statistic     : AMean
>   ==================================================================
>                   6.5.0-rc1 (base)    base + mathieu-queued-wakeup       
> base + aaron-tg-load-avg        base + queued-wakeup + tg-load-avg
>    1-clients:      59642.88 (0.00 pct)        61647.37 (3.36 
> pct)         61186.24 (2.58 pct)                 59099.11 (-0.91 pct)
>    2-clients:      59349.65 (0.00 pct)        60896.01 (2.60 
> pct)         60582.49 (2.07 pct)                 62738.47 (5.70 pct)
>    4-clients:      59197.37 (0.00 pct)        60457.29 (2.12 
> pct)         63042.52 (6.49 pct)                 60879.58 (2.84 pct)
>    8-clients:      61977.66 (0.00 pct)        60389.92 (-2.56 
> pct)        62078.15 (0.16 pct)                 60314.65 (-2.68 pct)
>   16-clients:      61518.83 (0.00 pct)        61143.51 (-0.61 
> pct)        60946.08 (-0.93 pct)                59388.78 (-3.46 pct)
>   32-clients:      58230.81 (0.00 pct)        58653.20 (0.72 
> pct)         58594.14 (0.62 pct)                 58188.52 (-0.07 pct)
>   64-clients:      58050.92 (0.00 pct)        57834.55 (-0.37 
> pct)        58183.51 (0.22 pct)                 57565.75 (-0.83 pct)
>   128-clients:     54324.55 (0.00 pct)        54385.60 (0.11 
> pct)         54913.43 (1.08 pct)                 53917.11 (-0.75 pct)
>   256-clients:     70155.29 (0.00 pct)        69390.68 (-1.08 
> pct)        70097.50 (-0.08 pct)                64410.66 (-8.18 pct)
>   512-clients:     61511.77 (0.00 pct)        61480.99 (-0.05 
> pct)        54493.82 (-11.40 pct)               46227.05 (-24.84 pct)
> 
> ==================================================================
> Test          : stream-10
> Units         : Bandwidth, MB/s
> Interpretation: Higher is better
> Statistic     : HMean
> ==================================================================
> Test:      6.5.0-rc1 (base)      base + mathieu-queued-wakeup         
> base + aaron-tg-load-avg       base + queued-wakeup + tg-load-avg
>   Copy:   353336.76 (0.00 pct)       352956.36 (-0.10 pct)               
> 349583.67 (-1.06 pct)               351152.80 (-0.61 pct)
> Scale:   353474.88 (0.00 pct)       354582.35 (0.31 pct)                
> 350543.75 (-0.82 pct)               353275.74 (-0.05 pct)
>    Add:   371984.24 (0.00 pct)       372824.87 (0.22 pct)                
> 369173.72 (-0.75 pct)               370483.63 (-0.40 pct)
> Triad:   372625.41 (0.00 pct)       278389.62 (-25.28 pct)              
> 369504.06 (-0.83 pct)               369070.11 (-0.95 pct)
> 
> ==================================================================
> Test          : stream-100
> Units         : Bandwidth, MB/s
> Interpretation: Higher is better
> Statistic     : HMean
> ==================================================================
> Test:     6.5.0-rc1 (base)        base + mathieu-queued-wakeup       
> base + aaron-tg-load-avg       base + queued-wakeup + tg-load-avg
>   Copy:   353476.35 (0.00 pct)       354954.50 (0.41 pct)                
> 354614.56 (0.32 pct)                353512.71 (0.01 pct)
> Scale:   353214.73 (0.00 pct)       354884.12 (0.47 pct)                
> 355841.17 (0.74 pct)                353220.53 (0.00 pct)
>    Add:   370755.48 (0.00 pct)       372292.72 (0.41 pct)                
> 375307.35 (1.22 pct)                369917.77 (-0.22 pct)
> Triad:   370652.02 (0.00 pct)       372732.11 (0.56 pct)                
> 375718.85 (1.36 pct)                369926.26 (-0.19 pct)
> 
> ==================================================================
> Test          : schbench (old)
> Units         : 99th percentile latency in us
> Interpretation: Lower is better
> Statistic     : Median
> ==================================================================
> #workers: 6.5.0-rc1 (base)    base + mathieu-queued-wakeup          base 
> + aaron-tg-load-avg        base + queued-wakeup + tg-load-avg
>    1:      56.00 (0.00 pct)        58.00 (-3.57 
> pct)                      60.00 (-7.14 pct)                   60.00 
> (-7.14 pct)
>    2:      61.00 (0.00 pct)        56.00 (8.19 
> pct)                       59.00 (3.27 pct)                    60.00 
> (1.63 pct)
>    4:      64.00 (0.00 pct)        62.00 (3.12 
> pct)                       66.00 (-3.12 pct)                   64.00 
> (0.00 pct)
>    8:      96.00 (0.00 pct)        78.00 (18.75 
> pct)                      76.00 (20.83 pct)                   93.00 
> (3.12 pct)
>   16:      98.00 (0.00 pct)        95.00 (3.06 
> pct)                       98.00 (0.00 pct)                    95.00 
> (3.06 pct)
>   32:     137.00 (0.00 pct)       144.00 (-5.10 pct)                     
> 133.00 (2.91 pct)                   130.00 (5.10 pct)
>   64:     206.00 (0.00 pct)       210.00 (-1.94 pct)                     
> 200.00 (2.91 pct)                   217.00 (-5.33 pct)
> 128:     348.00 (0.00 pct)       347.00 (0.28 pct)                      
> 413.00 (-18.67 pct)                 366.00 (-5.17 pct)
> 256:     679.00 (0.00 pct)       669.00 (1.47 pct)                      
> 669.00 (1.47 pct)                   675.00 (0.58 pct)
> 512:     1366.00 (0.00 pct)      1366.00 (0.00 pct)                     
> 1442.00 (-5.56 pct)                 1430.00 (-4.68 pct)
> 
> 
> ==================================================================
> Test          : schbench (new)
> Units         : 99th percentile latency in us
> Interpretation: Lower is better
> Statistic     : Median
> ==================================================================
> Metric: wakeup_lat_summary
> #workers: 6.5.0-rc1 (base)    base + mathieu-queued-wakeup          base 
> + aaron-tg-load-avg        base + queued-wakeup + tg-load-avg
>    1:      15.00 (0.00 pct)        15.00 (0.00 
> pct)                       16.00 (-6.66 pct)                   17.00 
> (-13.33 pct)
>    2:      16.00 (0.00 pct)        16.00 (0.00 
> pct)                       17.00 (-6.25 pct)                   17.00 
> (-6.25 pct)
>    4:      17.00 (0.00 pct)        17.00 (0.00 
> pct)                       15.00 (11.76 pct)                   17.00 
> (0.00 pct)
>    8:      11.00 (0.00 pct)        13.00 (-18.18 
> pct)                     11.00 (0.00 pct)                    11.00 (0.00 
> pct)
>   16:      11.00 (0.00 pct)        11.00 (0.00 
> pct)                       10.00 (9.09 pct)                     9.00 
> (18.18 pct)
>   32:      11.00 (0.00 pct)        11.00 (0.00 
> pct)                       11.00 (0.00 pct)                    11.00 
> (0.00 pct)
>   64:      10.00 (0.00 pct)        11.00 (-10.00 
> pct)                     10.00 (0.00 pct)                    10.00 (0.00 
> pct)
> 128:      11.00 (0.00 pct)        12.00 (-9.09 pct)                      
> 12.00 (-9.09 pct)                   11.00 (0.00 pct)
> 256:     117.00 (0.00 pct)       162.00 (-38.46 pct)                     
> 90.00 (23.07 pct)                  103.00 (11.96 pct)
> 512:     22496.00 (0.00 pct)     21664.00 (3.69 pct)                    
> 22368.00 (0.56 pct)                 21408.00 (4.83 pct)
> 
> Metric: request_lat_summary
> #workers: 6.5.0-rc1 (base)    base + mathieu-queued-wakeup          base 
> + aaron-tg-load-avg        base + queued-wakeup + tg-load-avg
>    1:     6872.00 (0.00 pct)      6872.00 (0.00 pct)                     
> 6792.00 (1.16 pct)                  6856.00 (0.23 pct)
>    2:     6824.00 (0.00 pct)      6824.00 (0.00 pct)                     
> 6872.00 (-0.70 pct)                 6856.00 (-0.46 pct)
>    4:     6824.00 (0.00 pct)      6808.00 (0.23 pct)                     
> 6872.00 (-0.70 pct)                 6824.00 (0.00 pct)
>    8:     6824.00 (0.00 pct)      6824.00 (0.00 pct)                     
> 6872.00 (-0.70 pct)                 6824.00 (0.00 pct)
>   16:     6824.00 (0.00 pct)      6840.00 (-0.23 pct)                    
> 6872.00 (-0.70 pct)                 6840.00 (-0.23 pct)
>   32:     6840.00 (0.00 pct)      6840.00 (0.00 pct)                     
> 6888.00 (-0.70 pct)                 6856.00 (-0.23 pct)
>   64:     6840.00 (0.00 pct)      6872.00 (-0.46 pct)                    
> 6888.00 (-0.70 pct)                 6872.00 (-0.46 pct)
> 128:     12272.00 (0.00 pct)     12784.00 (-4.17 pct)                   
> 13200.00 (-7.56 pct)                12016.00 (2.08 pct)
> 256:     13328.00 (0.00 pct)     13392.00 (-0.48 pct)                   
> 13712.00 (-2.88 pct)                13552.00 (-1.68 pct)
> 512:     88832.00 (0.00 pct)     86400.00 (2.73 pct)                    
> 88192.00 (0.72 pct)                 85632.00 (3.60 pct)
> 
> Metric: rps_summary
> #workers: 6.5.0-rc1 (base)    base + mathieu-queued-wakeup          base 
> + aaron-tg-load-avg       base + queued-wakeup + tg-load-avg
>    1:     297.00 (0.00 pct)       297.00 (0.00 pct)                      
> 297.00 (0.00 pct)                   299.00 (-0.67 pct)
>    2:     601.00 (0.00 pct)       603.00 (-0.33 pct)                     
> 595.00 (0.99 pct)                   601.00 (0.00 pct)
>    4:     1206.00 (0.00 pct)      1206.00 (0.00 pct)                     
> 1190.00 (1.32 pct)                  1206.00 (0.00 pct)
>    8:     2412.00 (0.00 pct)      2412.00 (0.00 pct)                     
> 2396.00 (0.66 pct)                  2420.00 (-0.33 pct)
>   16:     4840.00 (0.00 pct)      4824.00 (0.33 pct)                     
> 4792.00 (0.99 pct)                  4840.00 (0.00 pct)
>   32:     9648.00 (0.00 pct)      9648.00 (0.00 pct)                     
> 9584.00 (0.66 pct)                  9680.00 (-0.33 pct)
>   64:     19360.00 (0.00 pct)     19296.00 (0.33 pct)                    
> 19168.00 (0.99 pct)                 19296.00 (0.33 pct)
> 128:     37952.00 (0.00 pct)     35264.00 (7.08 pct)                    
> 36672.00 (3.37 pct)                 38080.00 (-0.33 pct)
> 256:     41408.00 (0.00 pct)     41536.00 (-0.30 pct)                   
> 39744.00 (4.01 pct)                 40896.00 (1.23 pct)
> 512:     36288.00 (0.00 pct)     36800.00 (-1.41 pct)                   
> 35264.00 (2.82 pct)                 35776.00 (1.41 pct)
> 
> Tested-by: Swapnil Sapkal <Swapnil.Sapkal@....com>
> 
>>
>> Thanks,
>>
>> Mathieu
>>
>> Mathieu Desnoyers (3):
>>    sched: Rename cpus_share_cache to cpus_share_llc
>>    sched: Introduce cpus_share_l2c (v3)
>>    sched: ttwu_queue_cond: skip queued wakeups across different l2 caches
>>
>> Cc: Ingo Molnar <mingo@...hat.com>
>> Cc: Peter Zijlstra <peterz@...radead.org>
>> Cc: Valentin Schneider <vschneid@...hat.com>
>> Cc: Steven Rostedt <rostedt@...dmis.org>
>> Cc: Ben Segall <bsegall@...gle.com>
>> Cc: Mel Gorman <mgorman@...e.de>
>> Cc: Daniel Bristot de Oliveira <bristot@...hat.com>
>> Cc: Vincent Guittot <vincent.guittot@...aro.org>
>> Cc: Juri Lelli <juri.lelli@...hat.com>
>> Cc: Swapnil Sapkal <Swapnil.Sapkal@....com>
>> Cc: Aaron Lu <aaron.lu@...el.com>
>> Cc: Julien Desfossez <jdesfossez@...italocean.com>
>> Cc: x86@...nel.org
>>
>>   block/blk-mq.c                 |  2 +-
>>   include/linux/sched/topology.h | 10 ++++++++--
>>   kernel/sched/core.c            | 14 +++++++++++---
>>   kernel/sched/fair.c            |  8 ++++----
>>   kernel/sched/sched.h           |  2 ++
>>   kernel/sched/topology.c        | 32 +++++++++++++++++++++++++++++---
>>   6 files changed, 55 insertions(+), 13 deletions(-)
>>
> -- 
> Thanks and Regards,
> Swapnil

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ