[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110908151433.GB6587@linux.vnet.ibm.com>
Date: Thu, 8 Sep 2011 20:45:07 +0530
From: Srivatsa Vaddagiri <vatsa@...ux.vnet.ibm.com>
To: Peter Zijlstra <a.p.zijlstra@...llo.nl>
Cc: Paul Turner <pjt@...gle.com>,
Kamalesh Babulal <kamalesh@...ux.vnet.ibm.com>,
Vladimir Davydov <vdavydov@...allels.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Bharata B Rao <bharata@...ux.vnet.ibm.com>,
Dhaval Giani <dhaval.giani@...il.com>,
Vaidyanathan Srinivasan <svaidy@...ux.vnet.ibm.com>,
Ingo Molnar <mingo@...e.hu>,
Pavel Emelianov <xemul@...allels.com>
Subject: Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs
unpinnede
* Peter Zijlstra <a.p.zijlstra@...llo.nl> [2011-09-07 21:22:22]:
> On Wed, 2011-09-07 at 20:50 +0530, Srivatsa Vaddagiri wrote:
> >
> > Fix excessive idle time reported when cgroups are capped.
>
> Where from? The whole idea of bandwidth caps is to introduce idle time,
> so what's excessive and where does it come from?
We have setup cgroups and their hard limits so that in theory they should
consume the entire capacity available on machine, leading to 0% idle time.
That's not what we see. A more detailed description of the setup and the problem
is here:
https://lkml.org/lkml/2011/6/7/352
but to quickly summarize it, the machine and the test-case is as below:
Machine : 16-cpus (2 Quad-core w/ HT enabled)
Cgroups : 5 in number (C1-C5), each having {2, 2, 4, 8, 16} tasks respectively.
Further, each task is placed in its own (sub-)cgroup with
a capped usage of 50% CPU.
/C1/C1_1/Task1 -> capped at 50% cpu usage
/C1/C1_2/Task2 -> capped at 50% cpu usage
/C2/C2_1/Task3 -> capped at 50% cpu usage
/C2/C2_2/Task3 -> capped at 50% cpu usage
/C3/C3_1/Task4 -> capped at 50% cpu usage
/C3/C3_2/Task4 -> capped at 50% cpu usage
/C3/C3_3/Task4 -> capped at 50% cpu usage
/C3/C3_4/Task4 -> capped at 50% cpu usage
...
/C5/C5_16/Task32 -> capped at 50% cpu usage
So we have 32 tasks, each capped at 50% CPU usage, run on a 16-CPU
system. One can expect 0% idle time in this scenario, which was found
not to be the case. With early versions of cfs hardlimits, upto ~20%
idle time was seen, though with the current version in tip, we see upto
~10% idle time (when cfs.period = 100ms) which goes down to ~5% when
cfs.period is set to 500ms.
>From what I could find out, the "excess" idle time crops up because
load-balancer is not perfect. For example, there are instances when a
CPU has just 1 task on its runqueue (rather then the ideal number of 2
tasks/cpu). When that lone task exceeds its 50% limit, cpu is forced to
become idle.
> > The patch introduces the notion of "steal"
>
> The virt folks already claimed steal-time and have it mean something
> entirely different. You get to pick a new name.
grace time?
> > (or "grace") time which is the surplus
> > time/bandwidth each cgroup is allowed to consume, subject to a maximum
> > steal time (sched_cfs_max_steal_time_us). Cgroups are allowed this "steal"
> > or "grace" time when the lone task running on a cpu is about to be throttled.
>
> Ok, so this is a solution to an unstated problem. Why is it a good
> solution?
I am not sure if there are any "good" solutions to this problem! One
possibility is to make the idle load balancer become aggressive in
pulling tasks across sched-domain boundaries i.e when a CPU becomes idle
(after a task got throttled) and invokes the idle load balancer, it
should try "harder" at pulling a task from far-off cpus (across
package/node boundaries)?
> Also, another tunable, yay!
- vatsa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists