lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <23e9a0f2-be96-4eb6-0242-2865180c1d6c@linux.ibm.com>
Date:   Sun, 26 Nov 2023 14:14:20 +0530
From:   Madadi Vineeth Reddy <vineethr@...ux.ibm.com>
To:     Chen Yu <yu.c.chen@...el.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Ingo Molnar <mingo@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        cover.1700548379.git.yu.c.chen@...el.com
Cc:     Tim Chen <tim.c.chen@...el.com>, Aaron Lu <aaron.lu@...el.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Mel Gorman <mgorman@...e.de>,
        K Prateek Nayak <kprateek.nayak@....com>,
        "Gautham R . Shenoy" <gautham.shenoy@....com>,
        Chen Yu <yu.chen.surf@...il.com>, linux-kernel@...r.kernel.org,
        Madadi Vineeth Reddy <vineethr@...ux.ibm.com>
Subject: Re: [PATCH v2 0/3] Introduce SIS_CACHE to choose previous CPU during
 task wakeup

Hi Chen Yu,

On 21/11/23 13:09, Chen Yu wrote:
> v1  -> v2:
> - Move the task sleep duration from sched_entity to task_struct. (Aaron Lu)
> - Refine the task sleep duration calculation based on task's previous running
>   CPU. (Aaron Lu)
> - Limit the cache-hot idle CPU scan depth to reduce the time spend on
>   searching, to fix the regression. (K Prateek Nayak)
> - Add test results of the real life workload per request from Ingo
>     Daytrader on a power system. (Madadi Vineeth Reddy)
>     OLTP workload on Xeon Sapphire Rapids.
> - Refined the commit log, added Reviewed-by tag to PATCH 1/3
>   (Mathieu Desnoyers).
> 
> RFC -> v1:
> - drop RFC
> - Only record the short sleeping time for each task, to better honor the
>   burst sleeping tasks. (Mathieu Desnoyers)
> - Keep the forward movement monotonic for runqueue's cache-hot timeout value.
>   (Mathieu Desnoyers, Aaron Lu)
> - Introduce a new helper function cache_hot_cpu() that considers
>   rq->cache_hot_timeout. (Aaron Lu)
> - Add analysis of why inhibiting task migration could bring better throughput
>   for some benchmarks. (Gautham R. Shenoy)
> - Choose the first cache-hot CPU, if all idle CPUs are cache-hot in
>   select_idle_cpu(). To avoid possible task stacking on the waker's CPU.
>   (K Prateek Nayak)
> 
> Thanks for the comments and tests!
> 
> ----------------------------------------------------------------------
> 
> This series aims to continue the discussion of how to make the wakee
> to choose its previous CPU easier.
> 
> When task p is woken up, the scheduler leverages select_idle_sibling()
> to find an idle CPU for it. p's previous CPU is usually a preference
> because it can improve cache locality. However in many cases, the
> previous CPU has already been taken by other wakees, thus p has to
> find another idle CPU.
> 
> Inhibit the task migration could benefit many workloads. Inspired by
> Mathieu's proposal to limit the task migration ratio[1], introduce
> the SIS_CACHE. It considers the sleep time of the task for better
> task placement. Based on the task's short sleeping history, tag p's
> previous CPU as cache-hot. Later when p is woken up, it can choose
> its previous CPU in select_idle_sibling(). When other task is
> woken up, skip this cache-hot idle CPU and try the next idle CPU
> when possible. The idea of SIS_CACHE is to optimize the idle CPU
> scan sequence. The extra scan time is minimized by restricting the
> scan depth of cache-hot CPUs to 50% of the scan depth of SIS_UTIL.
> 
> This test is based on tip/sched/core, on top of
> Commit ada87d23b734
> ("x86: Fix CPUIDLE_FLAG_IRQ_ENABLE leaking timer reprogram")
> 
> This patch set has shown 15% ~ 70% improvements for client/server
> workloads like netperf and tbench. It shows 0.7% improvement of
> OLTP with 0.2% run-to-run variation on Xeon 240 CPUs system.
> There is 2% improvement of another real life workload Daytrader
> per the test of Madadi on a power system with 96 CPUs. Prateek
> has helped check there is no obvious microbenchmark regression
> of the v2 on a 3rd Generation EPYC System with 128 CPUs.
> 

Tested the patch on power system with 46 cores. Total of 368 CPU's.
System has 8 NUMA nodes.

Below are some of the benchmark results.

schbench(new) 99.0th latency (lower is better)
========
case            load        	baseline[pct imp](std%)       SIS_CACHE[pct imp]( std%)
normal          1-mthreads      1.00 [ 0.00]( 4.34)            1.02 [ -2.00]( 5.98)
normal          2-mthreads      1.00 [ 0.00]( 13.95)           1.08 [ -8.00]( 10.39)
normal          4-mthreads      1.00 [ 0.00]( 6.20)            0.94 [ +6.00]( 10.90)
normal          6-mthreads      1.00 [ 0.00]( 12.76)           1.03 [ -3.00]( 9.33)

It seems like schbench is not much impacted with this patch(The pct imp of schbench is within the std%).
I expected some regression in wakeup latency while searching for an idle cpu which is not cache hot.
But I guess limiting the search depth had helped.


producer_consumer avg time/access (lower is better)
========
loads per consumer iteration   baseline[pct imp](std%)         SIS_CACHE[pct imp]( std%)
5                  		1.00 [ 0.00]( 0.00)            0.93 [ +7.00]( 4.77)
10                   		1.00 [ 0.00]( 0.00)            1.00 [  0.00]( 0.00)
20                    		1.00 [ 0.00]( 0.00)            1.00 [  0.00]( 0.00)

The main goal of the patch of improving cache locality is reflected as SIS_CACHE only improves in this workload, 
when loads per consumer iteration is lower.


hackbench normalized time in seconds (lower is better)
========
case            load        baseline[pct imp](std%)         SIS_CACHE[pct imp]( std%)
process-sockets 1-groups     1.00 [ 0.00]( 4.78)            0.99 [ +1.00]( 6.45)
process-sockets 2-groups     1.00 [ 0.00]( 0.97)            1.02 [ -2.00]( 1.87)
process-sockets 4-groups     1.00 [ 0.00]( 3.63)            1.01 [ -1.00]( 2.96)
process-sockets 8-groups     1.00 [ 0.00]( 0.43)            1.00 [  0.00]( 0.27)
process-pipe    1-groups     1.00 [ 0.00](23.77)            0.88 [+12.00](22.77)
process-pipe    2-groups     1.00 [ 0.00]( 3.44)            1.03 [ -3.00]( 4.00)
process-pipe    4-groups     1.00 [ 0.00]( 2.41)            0.98 [ +2.00]( 3.88)
process-pipe    8-groups     1.00 [ 0.00]( 7.09)            1.07 [ -7.00]( 4.25)
threads-pipe    1-groups     1.00 [ 0.00](18.47)            1.11 [-11.00](24.21)
threads-pipe    2-groups     1.00 [ 0.00]( 6.45)            0.97 [ +3.00]( 5.58)
threads-pipe    4-groups     1.00 [ 0.00]( 5.63)            0.96 [ +2.00]( 5.90)
threads-pipe    8-groups     1.00 [ 0.00]( 1.65)            1.03 [ -3.00]( 3.97)
threads-sockets 1-groups     1.00 [ 0.00]( 2.00)            1.00 [  0.00]( 0.65)
threads-sockets 2-groups     1.00 [ 0.00]( 1.69)            1.02 [ -2.00]( 1.48)
threads-sockets 4-groups     1.00 [ 0.00]( 5.66)            1.01 [ -1.00]( 3.56)
threads-sockets 8-groups     1.00 [ 0.00]( 0.26)            0.99 [ +1.00]( 0.36)

hackbench is not impacted.


Daytrader throughput (higher is better)
========
instances,users                baseline[pct imp](std%)         SIS_CACHE[pct imp]( std%)
3,30                 		1.00 [ 0.00]( 2.30)            1.02 [ +2.00]( 1.64)
3,60                 		1.00 [ 0.00]( 0.55)            1.01 [ +1.00]( 1.41)
3,90                  		1.00 [ 0.00]( 1.20)            1.02 [ +2.00]( 1.04)
3,120                  		1.00 [ 0.00]( 0.84)            1.02 [ +2.00]( 1.02)

A real life workload like daytrader is benefiting slightly with this patch.


Tested-by: Madadi Vineeth Reddy <vineethr@...ux.ibm.com>

Thanks and Regards
Madadi Vineeth Reddy

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ