[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <cef0717a-e1da-c4a3-9fd0-ddb0914e3850@linux.ibm.com>
Date: Thu, 5 Sep 2019 11:25:38 +0530
From: Parth Shah <parth@...ux.ibm.com>
To: subhra mazumdar <subhra.mazumdar@...cle.com>,
linux-kernel@...r.kernel.org
Cc: peterz@...radead.org, mingo@...hat.com, tglx@...utronix.de,
steven.sistare@...cle.com, dhaval.giani@...cle.com,
daniel.lezcano@...aro.org, vincent.guittot@...aro.org,
viresh.kumar@...aro.org, tim.c.chen@...ux.intel.com,
mgorman@...hsingularity.net, patrick.bellasi@....com
Subject: Re: [RFC PATCH 0/9] Task latency-nice
Hi Subhra,
On 8/30/19 11:19 PM, subhra mazumdar wrote:
> Introduce new per task property latency-nice for controlling scalability
> in scheduler idle CPU search path. Valid latency-nice values are from 1 to
> 100 indicating 1% to 100% search of the LLC domain in select_idle_cpu. New
> CPU cgroup file cpu.latency-nice is added as an interface to set and get.
> All tasks in the same cgroup share the same latency-nice value. Using a
> lower latency-nice value can help latency intolerant tasks e.g very short
> running OLTP threads where full LLC search cost can be significant compared
> to run time of the threads. The default latency-nice value is 5.
>
> In addition to latency-nice, it also adds a new sched feature SIS_CORE to
> be able to disable idle core search altogether which is costly and hurts
> more than it helps in short running workloads.
>
> Finally it also introduces a new per-cpu variable next_cpu to track
> the limit of search so that every time search starts from where it ended.
> This rotating search window over cpus in LLC domain ensures that idle
> cpus are eventually found in case of high load.
>
> Uperf pingpong on 2 socket, 44 core and 88 threads Intel x86 machine with
> message size = 8k (higher is better):
> threads baseline latency-nice=5,SIS_CORE latency-nice=5,NO_SIS_CORE
> 8 64.66 64.38 (-0.43%) 64.79 (0.2%)
> 16 123.34 122.88 (-0.37%) 125.87 (2.05%)
> 32 215.18 215.55 (0.17%) 247.77 (15.15%)
> 48 278.56 321.6 (15.45%) 321.2 (15.3%)
> 64 259.99 319.45 (22.87%) 333.95 (28.44%)
> 128 431.1 437.69 (1.53%) 431.09 (0%)
>
The result seems to be appealing with your experimental setup.
BTW, do you have any plans of load balancing as well based on latency niceness
of the tasks? It seems to be a more interesting case when we give pack the lower
latency sensitive tasks on fewer CPUs.
Also, do you see any workload results showing performance regression with NO_SIS_CORE?
Thanks,
Parth
Powered by blists - more mailing lists