[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4E77CB30.3030509@google.com>
Date: Mon, 19 Sep 2011 16:07:28 -0700
From: Paul Turner <pjt@...gle.com>
To: Peter Zijlstra <a.p.zijlstra@...llo.nl>
CC: Andi Kleen <andi@...stfloor.org>,
Glauber Costa <glommer@...allels.com>,
linux-kernel@...r.kernel.org, xemul@...allels.com,
paul@...lmenage.org, lizf@...fujitsu.com, daniel.lezcano@...e.fr,
mingo@...e.hu, jbottomley@...allels.com
Subject: Re: [PATCH 0/9] Per-cgroup /proc/stat
On 09/15/11 01:56, Peter Zijlstra wrote:
> On Wed, 2011-09-14 at 13:23 -0700, Andi Kleen wrote:
>> Peter Zijlstra<a.p.zijlstra@...llo.nl> writes:
>>>
>>> Guys we should seriously trim back a lot of that code, not grow ever
>>> more and more. The sad fact is that if you build a kernel with
>>> cpu-cgroup support the context switch cost is more than double that of a
>>> kernel without, and then you haven't even started creating cgroups yet.
>>
>> That sounds indeed quite bad. Is it known why it is so costly?
>
> Mostly because all data structures grow and all code paths grow, some by
> quite a bit, its spread all over the place, lots of little cuts etc..
>
> pjt and I tried trimming some of the code paths with static_branch() but
> didn't really get anywhere.. need to get back to looking at this stuff
> sometime soon.
When I get some time I think I'm just going to post a patch[*] that
merges the useful _field_ (usage, usage_percpu) from cpuacct into cpu
since we are *already* doing the accounting on the entity level making
this addition free.
At that point we could !CONFIG_CGROUP_CPUACCT by default and deprecate
the beast without breaking ABI for those who really need it (either
because their applications have hard-coded paths or because they really
like cgroup user/sys time -- which we COULD duplicate into cpu but I'm
inclined not to).
[*]: the only real caveat is how loudly people scream about the code
duplication; I think it's worth it if it let's us kill cpuacct in the
long run.
Another unrelated optimization on this path I have sitting around in
patches/ to push at some point is keeping the left-most entity out of
tree; since the worst case is an entity with a lower-vruntime comes
along and we insert the previous left-most and the best case is we get
to pick it without futzing with the rb-tree. I think this was good for
a percent or two when I hacked it together before.
Another idea I have kicking around for this path is the introduction of
a link_entity which bridges over nr_running=1 chains (break it
opportunistically when an element in the chain goes to nr_running=2).
This one requires some pretty careful accounting around the breaking of
a chain though so I'm not touching it until I get the new load tracking
code out. (Incidentally when I benchmarked it before LPC I had it
working out to be a little more efficient than the current math good for
~2-3% on pipe_test.)
- Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists