[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20100901180243.fe82cb61.kamezawa.hiroyu@jp.fujitsu.com>
Date: Wed, 1 Sep 2010 18:02:43 +0900
From: KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
To: Gui Jianfeng <guijianfeng@...fujitsu.com>
Cc: Vivek Goyal <vgoyal@...hat.com>, Jens Axboe <axboe@...nel.dk>,
Jeff Moyer <jmoyer@...hat.com>,
Divyesh Shah <dpshah@...gle.com>,
Corrado Zoccolo <czoccolo@...il.com>,
Nauman Rafique <nauman@...gle.com>,
linux kernel mailing list <linux-kernel@...r.kernel.org>
Subject: Re: [RFC] [PATCH] cfq-iosched: add cfq group hierarchical
scheduling support
On Wed, 01 Sep 2010 16:48:25 +0800
Gui Jianfeng <guijianfeng@...fujitsu.com> wrote:
> Add Kamezawa
>
> Vivek Goyal wrote:
> > On Tue, Aug 31, 2010 at 08:29:20AM +0800, Gui Jianfeng wrote:
> >> Vivek Goyal wrote:
> >>> On Mon, Aug 30, 2010 at 02:50:40PM +0800, Gui Jianfeng wrote:
> >>>> Hi All,
> >>>>
> >>>> This patch enables cfq group hierarchical scheduling.
> >>>>
> >>>> With this patch, you can create a cgroup directory deeper than level 1.
> >>>> Now, I/O Bandwidth is distributed in a hierarchy way. For example:
> >>>> We create cgroup directories as following(the number represents weight):
> >>>>
> >>>> Root grp
> >>>> / \
> >>>> grp_1(100) grp_2(400)
> >>>> / \
> >>>> grp_3(200) grp_4(300)
> >>>>
> >>>> If grp_2 grp_3 and grp_4 are contending for I/O Bandwidth,
> >>>> grp_2 will share 80% of total bandwidth.
> >>>> For sub_groups, grp_3 shares 8%(20% * 40%), grp_4 shares 12%(20% * 60%)
> >>>>
> >>>> Design:
> >>>> o Each cfq group has its own group service tree.
> >>>> o Each cfq group contains a "group schedule entity" (gse) that
> >>>> schedules on parent cfq group's service tree.
> >>>> o Each cfq group contains a "queue schedule entity"(qse), it
> >>>> represents all cfqqs located on this cfq group. It schedules
> >>>> on this group's service tree. For the time being, root group
> >>>> qse's weight is 1000, and subgroup qse's weight is 500.
> >>>> o All gses and qse which belones to a same cfq group schedules
> >>>> on the same group service tree.
> >>> Hi Gui,
> >>>
> >>> Thanks for the patch. I have few questions.
> >>>
> >>> - So how does the hierarchy look like, w.r.t root group. Something as
> >>> follows?
> >>>
> >>>
> >>> root
> >>> / | \
> >>> q1 q2 G1
> >>>
> >>> Assume there are two processes doin IO in root group and q1 and q2 are
> >>> cfqq queues for those processes and G1 is the cgroup created by user.
> >>>
> >>> If yes, then what algorithm do you use to do scheduling between q1, q2
> >>> and G1? IOW, currently we have two algorithms operating in CFQ. One for
> >>> cfqq and other for groups. Group algorithm does not use the logic of
> >>> cfq_slice_offset().
> >> Hi Vivek,
> >>
> >> This patch doesn't break the original sheduling logic. That is cfqg => st => cfqq.
> >> If q1 and q2 in root group, I treat q1 and q2 bundle as a queue sched entity, and
> >> it will schedule on root group service with G1, as following:
> >>
> >> root group
> >> / \
> >> qse(q1,q2) gse(G1)
> >>
> >
> > Ok. That's interesting. That raises another question that how hierarchy
> > should look like. IOW, how queue and groups should be treated in
> > hierarchy.
> >
> > CFS cpu scheduler treats queues and group at the same level. That is as
> > follows.
> >
> > root
> > / | \
> > q1 q2 G1
> >
> > In the past I had raised this question and Jens and corrado liked treating
> > queues and group at same level.
> >
> > Logically, q1, q2 and G1 are all children of root, so it makes sense to
> > treat them at same level and not group q1 and q2 in to a single entity and
> > group.
>
> Hi Vivek,
>
> IMO, this new approach keeps the original scheduling logic, and keep things
> simple. And to me, this approach works for me. So i choose it.
>
> >
> > One of the possible way forward could be this.
> >
> > - Treat queue and group at same level (like CFS)
> >
> > - Get rid of cfq_slice_offset() logic. That means without idling on, there
> > will be no ioprio difference between cfq queues. I think anyway as of
> > today that logic helps in so little situations that I would not mind
> > getting rid of it. Just that Jens should agree to it.
> >
> > - With this new scheme, it will break the existing semantics of root group
> > being at same level as child groups. To avoid that, we can probably
> > implement two modes (flat and hierarchical), something similar to what
> > memory cgroup controller has done. May be one tunable in root cgroup of
> > blkio "use_hierarchy". By default everything will be in flat mode and
> > if user wants hiearchical control, he needs to set user_hierarchy in
> > root group.
> >
> > I think memory controller provides "use_hierarchy" tunable in each
> > cgroup. I am not sure why do we need it in each cgroup and not just
> > in root cgroup.
>
> I think Kamezawa-san should be able to answer this question. :)
>
At first, please be sure that "hierarchical accounting is _very_ slow".
Please measure how hierarchical accounting (of 4-6 levels) are slow ;)
Then, there are 2 use cases.
1) root/to/some/directory/A
/B
/C
....
All A, B, C ....are flat cgroup and has no relationship, not sharing limit.
In this case, hierarchy should not be enabled.
2) root/to/some/directory/Gold/A,B,C...
Silver/D,E,F
All A, B, C ....are limited by "Gold" or "Silver".
But Gold and Silver has no relationthip, they has their own limitations.
But A, B, C, D, E, F shares limit under Gold or Silver.
In this case, hierarchy
"root/to/some/directory" should be disabled.
Gold/ and Silver should have use_hierarchy=1.
(Assume these Gold and Silver as Container and the user of container
divides memory into A and B, C...)
For example, libvirt makes very long "root/to/some/directory" ...
I never want to count-up all counters in the hierarchy even if
we'd like to use some fantasic hierarchical accounting under a container.
I don't like "all or nothing" option (as making use_hierarchy as mount
option or has parameter on root cgroup etc..) Then, allowed mixture.
Thanks,
-Kame
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists