[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4EB3E5E6.2060002@parallels.com>
Date: Fri, 4 Nov 2011 11:17:26 -0200
From: Glauber Costa <glommer@...allels.com>
To: Paul Menage <paul@...lmenage.org>
CC: Frederic Weisbecker <fweisbec@...il.com>,
Glauber Costa <glommer@...il.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Tim Hockin <thockin@...kin.org>,
LKML <linux-kernel@...r.kernel.org>,
Li Zefan <lizf@...fujitsu.com>,
Johannes Weiner <hannes@...xchg.org>,
Aditya Kali <adityakali@...gle.com>,
Oleg Nesterov <oleg@...hat.com>,
Kay Sievers <kay.sievers@...y.org>, Tejun Heo <tj@...nel.org>,
"Kirill A. Shutemov" <kirill@...temov.name>,
Containers <containers@...ts.linux-foundation.org>,
Paul Turner <pjt@...gle.com>, <luksow@...il.com>,
<cgroups@...r.kernel.org>
Subject: Re: [PATCH 00/10] cgroups: Task counter subsystem v6
On 11/03/2011 03:56 PM, Paul Menage wrote:
> On Thu, Nov 3, 2011 at 10:35 AM, Glauber Costa<glommer@...allels.com> wrote:
>>
>>> If multiple subsystems on the same hierarchy each need to
>>> walk up the pointer chain on the same event, then after the first
>>> subsystem has done so the chain will be in cache for any subsequent
>>> walks from other subsystems.
>>
>> No, it won't. Precisely because different subsystems have completely
>> independent pointer chains.
>
> Because they're following res_counter parent pointers, etc, rather
> than using the single cgroups parent pointer chain?
No. Because:
/sys/fs/cgroup/my_subsys/
/sys/fs/cgroup/my_subsys/foo1
/sys/fs/cgroup/my_subsys/foo2
/sys/fs/cgroup/my_subsys/foo1/bar1
and:
/sys/fs/cgroup/my_subsys2/
/sys/fs/cgroup/my_subsys2/foo1
/sys/fs/cgroup/my_subsys2/foo1/bar1
/sys/fs/cgroup/my_subsys2/foo1/bar2
Are completely independent pointer chains. the only thing they share is
the pointer to the root. And that's irrelevant in the pointer dance.
Also note that I used cpu and cpuacct as an example, and they don't use
res_counters.
> So if that's the problem, rather than artificially constrain
> flexibility in order to improve micro-benchmarks, why not come up with
> approaches that keep both the flexibility and the performance?
Well, I am not opposed to that even if you happen to agree on what I
said above. But in the end of the day, with many cgroups appearing, it
may not be about just micro benchmarks.
It is hard to draw the line, but I believe that avoiding creating new
cgroups subsystems when possible plays in our favor.
Specifically for this one, my arguments are:
* cgroups are a task-grouping entity
* therefore, all cgroups already do some task manipulation in attach/dettach
* all cgroups subsystem already can register a fork handler
Adding a fork limit as a cgroup property seems a logical step to me
based on that.
If, however, we are really creating this, I think we'd be better of
referring to this as a "Task Controller" rather than a "Task Counter".
Then at least in the near future when people start trying to limit other
task-related resources, this can serve as a natural placeholder for
this. (See the syscall limiting that Lukasz is trying to achieve)
>
> - make res_counter hierarchies be explicitly defined via the cgroup
> parent pointers, rather than an parent pointer hidden inside the
> res_counter. So the cgroup parent chain traversal would all be along
> the common parent pointers (and res_counter would be one pointer
> smaller).
>
>
> - allow subsystems to specify that they need a small amount of data
> that can be accessed efficiently up the cgroup chain. (Many subsystems
> wouldn't need this, and those that do would likely only need it for a
> subset of their per-cgroup data). Pack this data into as few
> cachelines as possible, allocated as a single lump of memory per
> cgroup. Each subsystem would know where in that allocation its private
> data lay (it would be the same offset for every cgroup, although
> dynamically determined at runtime based on the number of subsystems
> mounted on that hierarchy)
I thought about this second one myself.
I am not yet convinced this would be a win, but I believe there are chances.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists