linux-kernel - Re: [PATCH 0/9] Per-cgroup /proc/stat

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4E77CB30.3030509@google.com>
Date:	Mon, 19 Sep 2011 16:07:28 -0700
From:	Paul Turner <pjt@...gle.com>
To:	linux-kernel@...r.kernel.org
Cc:	Andi Kleen <andi@...stfloor.org>,
	Glauber Costa <glommer@...allels.com>,
	linux-kernel@...r.kernel.org, xemul@...allels.com,
	paul@...lmenage.org, lizf@...fujitsu.com, daniel.lezcano@...e.fr,
	mingo@...e.hu, jbottomley@...allels.com
Subject: Re: [PATCH 0/9] Per-cgroup /proc/stat

On 09/15/11 01:56, Peter Zijlstra wrote:
> On Wed, 2011-09-14 at 13:23 -0700, Andi Kleen wrote:
>> Peter Zijlstra<a.p.zijlstra@...llo.nl>  writes:
>>>
>>> Guys we should seriously trim back a lot of that code, not grow ever
>>> more and more. The sad fact is that if you build a kernel with
>>> cpu-cgroup support the context switch cost is more than double that of a
>>> kernel without, and then you haven't even started creating cgroups yet.
>>
>> That sounds indeed quite bad. Is it known why it is so costly?
>
> Mostly because all data structures grow and all code paths grow, some by
> quite a bit, its spread all over the place, lots of little cuts etc..
>
> pjt and I tried trimming some of the code paths with static_branch() but
> didn't really get anywhere.. need to get back to looking at this stuff
> sometime soon.

When I get some time I think I'm just going to post a patch[*] that 
merges the useful _field_ (usage, usage_percpu) from cpuacct into cpu 
since we are *already* doing the accounting on the entity level making 
this addition free.

At that point we could !CONFIG_CGROUP_CPUACCT by default and deprecate 
the beast without breaking ABI for those who really need it (either 
because their applications have hard-coded paths or because they really 
like cgroup user/sys time -- which we COULD duplicate into cpu but I'm 
inclined not to).

[*]: the only real caveat is how loudly people scream about the code 
duplication; I think it's worth it if it let's us kill cpuacct in the 
long run.

Another unrelated optimization on this path I have sitting around in 
patches/ to push at some point is keeping the left-most entity out of 
tree; since the worst case is an entity with a lower-vruntime comes 
along and we insert the previous left-most and the best case is we get 
to pick it without futzing with the rb-tree.  I think this was good for 
a percent or two when I hacked it together before.

Another idea I have kicking around for this path is the introduction of 
a link_entity which bridges over nr_running=1 chains (break it 
opportunistically when an element in the chain goes to nr_running=2). 
This one requires some pretty careful accounting around the breaking of 
a chain though so I'm not touching it until I get the new load tracking 
code out.  (Incidentally when I benchmarked it before LPC I had it 
working out to be a little more efficient than the current math good for 
~2-3% on pipe_test.)

- Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/