[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4e5e476b0912170341h7ba632akddb921c996a36f73@mail.gmail.com>
Date: Thu, 17 Dec 2009 12:41:32 +0100
From: Corrado Zoccolo <czoccolo@...il.com>
To: Vivek Goyal <vgoyal@...hat.com>
Cc: linux-kernel@...r.kernel.org, jens.axboe@...cle.com,
nauman@...gle.com, lizf@...fujitsu.com, ryov@...inux.co.jp,
fernando@....ntt.co.jp, taka@...inux.co.jp,
guijianfeng@...fujitsu.com, jmoyer@...hat.com,
m-ikeda@...jp.nec.com, Alan.Brunelle@...com
Subject: Re: [RFC] CFQ group scheduling structure organization
Hi,
On Wed, Dec 16, 2009 at 11:52 PM, Vivek Goyal <vgoyal@...hat.com> wrote:
> Hi All,
>
> With some basic group scheduling support in CFQ, there are few questions
> regarding how group structure should look like in CFQ.
>
> Currently, grouping looks as follows. A, and B are two cgroups created by
> user.
>
> [snip]
>
> Proposal 4:
> ==========
> Treat task and group at same level. Currently groups are at top level and
> at second level are tasks. View the whole hierarchy as follows.
>
>
> service-tree
> / | \ \
> T1 T2 G1 G2
>
> Here T1 and T2 are two tasks in root group and G1 and G2 are two cgroups
> created under root.
>
> In this kind of scheme, any RT task in root group will still be system
> wide RT even if we create groups G1 and G2.
>
> So what are the issues?
>
> - I talked to few folks and everybody found this scheme not so intutive.
> Their argument was that once I create a cgroup, say A, under root, then
> bandwidth should be divided between "root" and "A" proportionate to
> the weight.
>
> It is not very intutive that group is competing with all the tasks
> running in root group. And disk share of newly created group will change
> if more tasks fork in root group. So it is highly dynamic and not
> static hence un-intutive.
>
> To emulate the behavior of previous proposals, root shall have to create
> a new group and move all root tasks there. But admin shall have to still
> keep RT tasks in root group so that they still remain system-wide.
>
> service-tree
> / | \ \
> T1 root G1 G2
> |
> T2
>
> Now admin has specifically created a group "root" along side G1 and G2
> and moved T2 under root. T1 is still left in top level group as it might
> be an RT task and we want it to remain RT task systemwide.
>
> So to some people this scheme is un-intutive and requires more work in
> user space to achive desired behavior. I am kind of 50:50 between two
> kind of arrangements.
>
This is the one I prefer: it is the most natural one if you see that
groups are scheduling entities like any other task.
I think it becomes intuitive with an analogy with a qemu (e.g. kvm)
virtual machine model. If you think a group like a virtual machine, it
is clear that for the normal system, the whole virtual machine is a
single scheduling entity, and that it has to compete with other
virtual machines (as other single entities) and every process in the
real system (those are inherently more important, since without the
real system, the VMs cannot simply exist).
Having a designated root group, instead, resembles the xen VM model,
where you have a separated domain for each VM and for the real system.
I think the implementation of this approach can make the code simpler
and modular (CFQ could be abstracted to deal with scheduling entities,
and each scheduling entity could be defined in a separate file).
Within each group, you will now have the choice of how to schedule its
queues. This means that you could possibly have different I/O
schedulers within each group, and even have sub-groups within groups.
>
> I am looking for some feedback on what makes most sense.
I think that regardless of our preference, we should coordinate with
how the CPU scheduler works, since I think the users will be more
surprised to see cgroups behaving different w.r.t. CPU and disk, than
if the RT task behaviour changes when cgroups are introduced.
Thanks,
Corrado
>
> For the time being, I am little inclined towards proposal 2 and I have
> implemented a proof of concept version on top of for-2.6.33 branch in block
> tree. These patches are compile and boot tested only and I have yet to do
> testing.
>
> Thanks
> Vivek
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists