linux-kernel - Re: [PATCH 00/10] cgroups: Task counter subsystem v6

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4EB3E5E6.2060002@parallels.com>
Date:	Fri, 4 Nov 2011 11:17:26 -0200
From:	Glauber Costa <glommer@...allels.com>
To:	Paul Menage <paul@...lmenage.org>
CC:	Frederic Weisbecker <fweisbec@...il.com>,
	Glauber Costa <glommer@...il.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Tim Hockin <thockin@...kin.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Li Zefan <lizf@...fujitsu.com>,
	Johannes Weiner <hannes@...xchg.org>,
	Aditya Kali <adityakali@...gle.com>,
	Oleg Nesterov <oleg@...hat.com>,
	Kay Sievers <kay.sievers@...y.org>, Tejun Heo <tj@...nel.org>,
	"Kirill A. Shutemov" <kirill@...temov.name>,
	Containers <containers@...ts.linux-foundation.org>,
	Paul Turner <pjt@...gle.com>, <luksow@...il.com>,
	<cgroups@...r.kernel.org>
Subject: Re: [PATCH 00/10] cgroups: Task counter subsystem v6

On 11/03/2011 03:56 PM, Paul Menage wrote:
> On Thu, Nov 3, 2011 at 10:35 AM, Glauber Costa<glommer@...allels.com>  wrote:
>>
>>> If multiple subsystems on the same hierarchy each need to
>>> walk up the pointer chain on the same event, then after the first
>>> subsystem has done so the chain will be in cache for any subsequent
>>> walks from other subsystems.
>>
>> No, it won't. Precisely because different subsystems have completely
>> independent pointer chains.
>
> Because they're following res_counter parent pointers, etc, rather
> than using the single cgroups parent pointer chain?

No. Because:

/sys/fs/cgroup/my_subsys/
/sys/fs/cgroup/my_subsys/foo1
/sys/fs/cgroup/my_subsys/foo2
/sys/fs/cgroup/my_subsys/foo1/bar1

and:

/sys/fs/cgroup/my_subsys2/
/sys/fs/cgroup/my_subsys2/foo1
/sys/fs/cgroup/my_subsys2/foo1/bar1
/sys/fs/cgroup/my_subsys2/foo1/bar2

Are completely independent pointer chains. the only thing they share is 
the pointer to the root. And that's irrelevant in the pointer dance.
Also note that I used cpu and cpuacct as an example, and they don't use 
res_counters.

> So if that's the problem, rather than artificially constrain
> flexibility in order to improve micro-benchmarks, why not come up with
> approaches that keep both the flexibility and the performance?

Well, I am not opposed to that even if you happen to agree on what I 
said above. But in the end of the day, with many cgroups appearing, it
may not be about just micro benchmarks.

It is hard to draw the line, but I believe that avoiding creating new 
cgroups subsystems when possible plays in our favor.

Specifically for this one, my arguments are:

* cgroups are a task-grouping entity
* therefore, all cgroups already do some task manipulation in attach/dettach
* all cgroups subsystem already can register a fork handler

Adding a fork limit as a cgroup property seems a logical step to me 
based on that.

If, however, we are really creating this, I think we'd be better of 
referring to this as a "Task Controller" rather than a "Task Counter".

Then at least in the near future when people start trying to limit other 
task-related resources, this can serve as a natural placeholder for 
this. (See the syscall limiting that Lukasz is trying to achieve)

>
> - make res_counter hierarchies be explicitly defined via the cgroup
> parent pointers, rather than an parent pointer hidden inside the
> res_counter. So the cgroup parent chain traversal would all be along
> the common parent pointers (and res_counter would be one pointer
> smaller).
 >
>
> - allow subsystems to specify that they need a small amount of data
> that can be accessed efficiently up the cgroup chain. (Many subsystems
> wouldn't need this, and those that do would likely only need it for a
> subset of their per-cgroup data). Pack this data into as few
> cachelines as possible, allocated as a single lump of memory per
> cgroup. Each subsystem would know where in that allocation its private
> data lay (it would be the same offset for every cgroup, although
> dynamically determined at runtime based on the number of subsystems
> mounted on that hierarchy)
I thought about this second one myself.
I am not yet convinced this would be a win, but I believe there are chances.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/