lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100901171027.GA22149@redhat.com>
Date:	Wed, 1 Sep 2010 13:10:27 -0400
From:	Vivek Goyal <vgoyal@...hat.com>
To:	Nauman Rafique <nauman@...gle.com>
Cc:	Gui Jianfeng <guijianfeng@...fujitsu.com>,
	Jens Axboe <axboe@...nel.dk>, Jeff Moyer <jmoyer@...hat.com>,
	Divyesh Shah <dpshah@...gle.com>,
	Corrado Zoccolo <czoccolo@...il.com>,
	linux kernel mailing list <linux-kernel@...r.kernel.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
Subject: Re: [RFC] [PATCH] cfq-iosched: add cfq group hierarchical scheduling
 support

On Wed, Sep 01, 2010 at 08:49:26AM -0700, Nauman Rafique wrote:
> On Wed, Sep 1, 2010 at 1:50 AM, Gui Jianfeng <guijianfeng@...fujitsu.com> wrote:
> > Vivek Goyal wrote:
> >> On Tue, Aug 31, 2010 at 08:40:19AM -0700, Nauman Rafique wrote:
> >>> On Tue, Aug 31, 2010 at 5:57 AM, Vivek Goyal <vgoyal@...hat.com> wrote:
> >>>> On Tue, Aug 31, 2010 at 08:29:20AM +0800, Gui Jianfeng wrote:
> >>>>> Vivek Goyal wrote:
> >>>>>> On Mon, Aug 30, 2010 at 02:50:40PM +0800, Gui Jianfeng wrote:
> >>>>>>> Hi All,
> >>>>>>>
> >>>>>>> This patch enables cfq group hierarchical scheduling.
> >>>>>>>
> >>>>>>> With this patch, you can create a cgroup directory deeper than level 1.
> >>>>>>> Now, I/O Bandwidth is distributed in a hierarchy way. For example:
> >>>>>>> We create cgroup directories as following(the number represents weight):
> >>>>>>>
> >>>>>>>             Root grp
> >>>>>>>            /       \
> >>>>>>>        grp_1(100) grp_2(400)
> >>>>>>>        /    \
> >>>>>>>   grp_3(200) grp_4(300)
> >>>>>>>
> >>>>>>> If grp_2 grp_3 and grp_4 are contending for I/O Bandwidth,
> >>>>>>> grp_2 will share 80% of total bandwidth.
> >>>>>>> For sub_groups, grp_3 shares 8%(20% * 40%), grp_4 shares 12%(20% * 60%)
> >>>>>>>
> >>>>>>> Design:
> >>>>>>>   o Each cfq group has its own group service tree.
> >>>>>>>   o Each cfq group contains a "group schedule entity" (gse) that
> >>>>>>>     schedules on parent cfq group's service tree.
> >>>>>>>   o Each cfq group contains a "queue schedule entity"(qse), it
> >>>>>>>     represents all cfqqs located on this cfq group. It schedules
> >>>>>>>     on this group's service tree. For the time being, root group
> >>>>>>>     qse's weight is 1000, and subgroup qse's weight is 500.
> >>>>>>>   o All gses and qse which belones to a same cfq group schedules
> >>>>>>>     on the same group service tree.
> >>>>>> Hi Gui,
> >>>>>>
> >>>>>> Thanks for the patch. I have few questions.
> >>>>>>
> >>>>>> - So how does the hierarchy look like, w.r.t root group. Something as
> >>>>>>   follows?
> >>>>>>
> >>>>>>
> >>>>>>                     root
> >>>>>>                    / | \
> >>>>>>                  q1  q2 G1
> >>>>>>
> >>>>>> Assume there are two processes doin IO in root group and q1 and q2 are
> >>>>>> cfqq queues for those processes and G1 is the cgroup created by user.
> >>>>>>
> >>>>>> If yes, then what algorithm do you use to do scheduling between q1, q2
> >>>>>> and G1? IOW, currently we have two algorithms operating in CFQ. One for
> >>>>>> cfqq and other for groups. Group algorithm does not use the logic of
> >>>>>> cfq_slice_offset().
> >>>>> Hi Vivek,
> >>>>>
> >>>>> This patch doesn't break the original sheduling logic. That is cfqg => st => cfqq.
> >>>>> If q1 and q2 in root group, I treat q1 and q2 bundle as a queue sched entity, and
> >>>>> it will schedule on root group service with G1, as following:
> >>>>>
> >>>>>                          root group
> >>>>>                         /         \
> >>>>>                     qse(q1,q2)    gse(G1)
> >>>>>
> >>>> Ok. That's interesting. That raises another question that how hierarchy
> >>>> should look like. IOW, how queue and groups should be treated in
> >>>> hierarchy.
> >>>>
> >>>> CFS cpu scheduler treats queues and group at the same level. That is as
> >>>> follows.
> >>>>
> >>>>                        root
> >>>>                        / | \
> >>>>                       q1 q2 G1
> >>>>
> >>>> In the past I had raised this question and Jens and corrado liked treating
> >>>> queues and group at same level.
> >>>>
> >>>> Logically, q1, q2 and G1 are all children of root, so it makes sense to
> >>>> treat them at same level and not group q1 and q2 in to a single entity and
> >>>> group.
> >>>>
> >>>> One of the possible way forward could be this.
> >>>>
> >>>> - Treat queue and group at same level (like CFS)
> >>>>
> >>>> - Get rid of cfq_slice_offset() logic. That means without idling on, there
> >>>>  will be no ioprio difference between cfq queues. I think anyway as of
> >>>>  today that logic helps in so little situations that I would not mind
> >>>>  getting rid of it. Just that Jens should agree to it.
> >>>>
> >>>> - With this new scheme, it will break the existing semantics of root group
> >>>>  being at same level as child groups. To avoid that, we can probably
> >>>>  implement two modes (flat and hierarchical), something similar to what
> >>>>  memory cgroup controller has done. May be one tunable in root cgroup of
> >>>>  blkio "use_hierarchy".  By default everything will be in flat mode and
> >>>>  if user wants hiearchical control, he needs to set user_hierarchy in
> >>>>  root group.
> >>> Vivek, may be I am reading you wrong here. But you are first
> >>> suggesting to add more complexity to treat queues and group at the
> >>> same level. Then you are suggesting add even more complexity to fix
> >>> the problems caused by that approach.
> >>>
> >>> Why do we need to treat queues and group at the same level? "CFS does
> >>> it" is not a good argument.
> >>
> >> Sure it is not a very good argument but at the same time one would need
> >> a very good argument that why we should do things differently.
> >>
> >> - If a user has mounted cpu and blkio controller together and both the
> >>   controllers are viewing the same hierarchy differently, then it is
> >>   odd. We need a good reason that why different arrangement makes sense.
> >
> > Hi Vivek,
> >
> > Even if we mount cpu and blkio together, to me, it's ok for cpu and blkio
> > having their own logic, since they are totally different cgroup subsystems.
> >
> >>
> >> - To me, both group and cfq queue are children of root group and it
> >>   makes sense to treat them independent childrens instead of putting
> >>   all the queues in one logical group which inherits the weight of
> >>   parent.
> >>
> >> - With this new scheme, I am finding it hard to visualize the hierachy.
> >>   How do you assign the weights to queue entities of a group. It is more
> >>   like a invisible group with-in group. We shall have to create new
> >>   tunable which can speicy the weight for this hidden group.
> >
> > For the time being, the root "qse" weight is 1000 and others is 500, they don't
> > inherit the weight of parent. I was thinking that maybe we can determine the qse
> > weight in term of the queue number and weight in this group and subgroups.
> >
> > Thanks,
> > Gui
> >
> >>
> >>
> >> So in summary I am liking the "queue at same level as group" scheme for
> >> two reasons.
> >>
> >> - It is more intutive to visualize and implement. It follows the true
> >>   hierarchy as seen by cgroup file system.
> >>
> >> - CFS has already implemented this scheme. So we need a strong arguemnt
> >>   to justify why we should not follow the same thing. Especially for
> >>   the case where user has co-mounted cpu and blkio controller.
> >>
> >> - It can achieve the same goal as "hidden group" proposal just by
> >>   creating a cgroup explicitly and moving all threads in that group.
> >>
> >> Why do you think that "hidden group" proposal is better than "treating
> >> queue at same level as group" ?
> 
> There are multiple reasons for "hidden group" proposal being a better approach.
> 
> - "Hidden group" would allow us to keep scheduling queues using the
> CFQ queue scheduling logic. And does not require any major changes in
> CFQ. Aren't we already using that approach to deal with queues at the
> root group?

Currently we are operating in flat mode where all the groups are at 
same level (irrespective their position in cgroup hiearchy). 

> 
> - If queues and groups are treated at the same level, queues can end
> up in root cgroup. And we cannot put an upper bound on the number of
> those queues. Those queues can consume system resources in proportion
> to their number, causing the performance of groups to suffer. If we
> have "hidden group", we can configure it to a small weight, and that
> would limit the impact these queues in root group can have.

To limit the impact of other queues in cgroup, one can use libcgroup to
automatically place new threads or tasks into a subgroup.

I understand that kernel doing it by default should help though. It is
less work in terms of configuration. But I am not sure that's a good 
argument to design kernel functionality. Kernel functionality should be
pretty generic.

Anyway, how would you assign the weight to the hidden group. What's the
interface for that? A new cgroup file inside each cgroup? Personally
I think that's little odd interface. Every group has one hidden group
where all the queues in that group go and weight of that group can be
specified by a cgroup file.

But anyway, I am not tied to any of the approach. I am just trying to
make sure that we have put enough thought into it as changing it later
will be hard.

Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ