linux-kernel - Re: [RFC] [PATCH] cfq-iosched: add cfq group hierarchical scheduling support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4C7EF02D.9040202@cn.fujitsu.com>
Date:	Thu, 02 Sep 2010 08:30:37 +0800
From:	Gui Jianfeng <guijianfeng@...fujitsu.com>
To:	Nauman Rafique <nauman@...gle.com>
CC:	Vivek Goyal <vgoyal@...hat.com>, Jens Axboe <axboe@...nel.dk>,
	Jeff Moyer <jmoyer@...hat.com>,
	Divyesh Shah <dpshah@...gle.com>,
	Corrado Zoccolo <czoccolo@...il.com>,
	linux kernel mailing list <linux-kernel@...r.kernel.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
Subject: Re: [RFC] [PATCH] cfq-iosched: add cfq group hierarchical scheduling
 support

Nauman Rafique wrote:
> On Wed, Sep 1, 2010 at 10:10 AM, Vivek Goyal <vgoyal@...hat.com> wrote:
>> On Wed, Sep 01, 2010 at 08:49:26AM -0700, Nauman Rafique wrote:
>>> On Wed, Sep 1, 2010 at 1:50 AM, Gui Jianfeng <guijianfeng@...fujitsu.com> wrote:
>>>> Vivek Goyal wrote:
>>>>> On Tue, Aug 31, 2010 at 08:40:19AM -0700, Nauman Rafique wrote:
>>>>>> On Tue, Aug 31, 2010 at 5:57 AM, Vivek Goyal <vgoyal@...hat.com> wrote:
>>>>>>> On Tue, Aug 31, 2010 at 08:29:20AM +0800, Gui Jianfeng wrote:
>>>>>>>> Vivek Goyal wrote:
>>>>>>>>> On Mon, Aug 30, 2010 at 02:50:40PM +0800, Gui Jianfeng wrote:
>>>>>>>>>> Hi All,
>>>>>>>>>>
>>>>>>>>>> This patch enables cfq group hierarchical scheduling.
>>>>>>>>>>
>>>>>>>>>> With this patch, you can create a cgroup directory deeper than level 1.
>>>>>>>>>> Now, I/O Bandwidth is distributed in a hierarchy way. For example:
>>>>>>>>>> We create cgroup directories as following(the number represents weight):
>>>>>>>>>>
>>>>>>>>>>             Root grp
>>>>>>>>>>            /       \
>>>>>>>>>>        grp_1(100) grp_2(400)
>>>>>>>>>>        /    \
>>>>>>>>>>   grp_3(200) grp_4(300)
>>>>>>>>>>
>>>>>>>>>> If grp_2 grp_3 and grp_4 are contending for I/O Bandwidth,
>>>>>>>>>> grp_2 will share 80% of total bandwidth.
>>>>>>>>>> For sub_groups, grp_3 shares 8%(20% * 40%), grp_4 shares 12%(20% * 60%)
>>>>>>>>>>
>>>>>>>>>> Design:
>>>>>>>>>>   o Each cfq group has its own group service tree.
>>>>>>>>>>   o Each cfq group contains a "group schedule entity" (gse) that
>>>>>>>>>>     schedules on parent cfq group's service tree.
>>>>>>>>>>   o Each cfq group contains a "queue schedule entity"(qse), it
>>>>>>>>>>     represents all cfqqs located on this cfq group. It schedules
>>>>>>>>>>     on this group's service tree. For the time being, root group
>>>>>>>>>>     qse's weight is 1000, and subgroup qse's weight is 500.
>>>>>>>>>>   o All gses and qse which belones to a same cfq group schedules
>>>>>>>>>>     on the same group service tree.
>>>>>>>>> Hi Gui,
>>>>>>>>>
>>>>>>>>> Thanks for the patch. I have few questions.
>>>>>>>>>
>>>>>>>>> - So how does the hierarchy look like, w.r.t root group. Something as
>>>>>>>>>   follows?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                     root
>>>>>>>>>                    / | \
>>>>>>>>>                  q1  q2 G1
>>>>>>>>>
>>>>>>>>> Assume there are two processes doin IO in root group and q1 and q2 are
>>>>>>>>> cfqq queues for those processes and G1 is the cgroup created by user.
>>>>>>>>>
>>>>>>>>> If yes, then what algorithm do you use to do scheduling between q1, q2
>>>>>>>>> and G1? IOW, currently we have two algorithms operating in CFQ. One for
>>>>>>>>> cfqq and other for groups. Group algorithm does not use the logic of
>>>>>>>>> cfq_slice_offset().
>>>>>>>> Hi Vivek,
>>>>>>>>
>>>>>>>> This patch doesn't break the original sheduling logic. That is cfqg => st => cfqq.
>>>>>>>> If q1 and q2 in root group, I treat q1 and q2 bundle as a queue sched entity, and
>>>>>>>> it will schedule on root group service with G1, as following:
>>>>>>>>
>>>>>>>>                          root group
>>>>>>>>                         /         \
>>>>>>>>                     qse(q1,q2)    gse(G1)
>>>>>>>>
>>>>>>> Ok. That's interesting. That raises another question that how hierarchy
>>>>>>> should look like. IOW, how queue and groups should be treated in
>>>>>>> hierarchy.
>>>>>>>
>>>>>>> CFS cpu scheduler treats queues and group at the same level. That is as
>>>>>>> follows.
>>>>>>>
>>>>>>>                        root
>>>>>>>                        / | \
>>>>>>>                       q1 q2 G1
>>>>>>>
>>>>>>> In the past I had raised this question and Jens and corrado liked treating
>>>>>>> queues and group at same level.
>>>>>>>
>>>>>>> Logically, q1, q2 and G1 are all children of root, so it makes sense to
>>>>>>> treat them at same level and not group q1 and q2 in to a single entity and
>>>>>>> group.
>>>>>>>
>>>>>>> One of the possible way forward could be this.
>>>>>>>
>>>>>>> - Treat queue and group at same level (like CFS)
>>>>>>>
>>>>>>> - Get rid of cfq_slice_offset() logic. That means without idling on, there
>>>>>>>  will be no ioprio difference between cfq queues. I think anyway as of
>>>>>>>  today that logic helps in so little situations that I would not mind
>>>>>>>  getting rid of it. Just that Jens should agree to it.
>>>>>>>
>>>>>>> - With this new scheme, it will break the existing semantics of root group
>>>>>>>  being at same level as child groups. To avoid that, we can probably
>>>>>>>  implement two modes (flat and hierarchical), something similar to what
>>>>>>>  memory cgroup controller has done. May be one tunable in root cgroup of
>>>>>>>  blkio "use_hierarchy".  By default everything will be in flat mode and
>>>>>>>  if user wants hiearchical control, he needs to set user_hierarchy in
>>>>>>>  root group.
>>>>>> Vivek, may be I am reading you wrong here. But you are first
>>>>>> suggesting to add more complexity to treat queues and group at the
>>>>>> same level. Then you are suggesting add even more complexity to fix
>>>>>> the problems caused by that approach.
>>>>>>
>>>>>> Why do we need to treat queues and group at the same level? "CFS does
>>>>>> it" is not a good argument.
>>>>> Sure it is not a very good argument but at the same time one would need
>>>>> a very good argument that why we should do things differently.
>>>>>
>>>>> - If a user has mounted cpu and blkio controller together and both the
>>>>>   controllers are viewing the same hierarchy differently, then it is
>>>>>   odd. We need a good reason that why different arrangement makes sense.
>>>> Hi Vivek，
>>>>
>>>> Even if we mount cpu and blkio together, to me, it's ok for cpu and blkio
>>>> having their own logic, since they are totally different cgroup subsystems.
>>>>
>>>>> - To me, both group and cfq queue are children of root group and it
>>>>>   makes sense to treat them independent childrens instead of putting
>>>>>   all the queues in one logical group which inherits the weight of
>>>>>   parent.
>>>>>
>>>>> - With this new scheme, I am finding it hard to visualize the hierachy.
>>>>>   How do you assign the weights to queue entities of a group. It is more
>>>>>   like a invisible group with-in group. We shall have to create new
>>>>>   tunable which can speicy the weight for this hidden group.
>>>> For the time being, the root "qse" weight is 1000 and others is 500, they don't
>>>> inherit the weight of parent. I was thinking that maybe we can determine the qse
>>>> weight in term of the queue number and weight in this group and subgroups.
>>>>
>>>> Thanks,
>>>> Gui
>>>>
>>>>>
>>>>> So in summary I am liking the "queue at same level as group" scheme for
>>>>> two reasons.
>>>>>
>>>>> - It is more intutive to visualize and implement. It follows the true
>>>>>   hierarchy as seen by cgroup file system.
>>>>>
>>>>> - CFS has already implemented this scheme. So we need a strong arguemnt
>>>>>   to justify why we should not follow the same thing. Especially for
>>>>>   the case where user has co-mounted cpu and blkio controller.
>>>>>
>>>>> - It can achieve the same goal as "hidden group" proposal just by
>>>>>   creating a cgroup explicitly and moving all threads in that group.
>>>>>
>>>>> Why do you think that "hidden group" proposal is better than "treating
>>>>> queue at same level as group" ?
>>> There are multiple reasons for "hidden group" proposal being a better approach.
>>>
>>> - "Hidden group" would allow us to keep scheduling queues using the
>>> CFQ queue scheduling logic. And does not require any major changes in
>>> CFQ. Aren't we already using that approach to deal with queues at the
>>> root group?
>> Currently we are operating in flat mode where all the groups are at
>> same level (irrespective their position in cgroup hiearchy).
>>
>>> - If queues and groups are treated at the same level, queues can end
>>> up in root cgroup. And we cannot put an upper bound on the number of
>>> those queues. Those queues can consume system resources in proportion
>>> to their number, causing the performance of groups to suffer. If we
>>> have "hidden group", we can configure it to a small weight, and that
>>> would limit the impact these queues in root group can have.
>> To limit the impact of other queues in cgroup, one can use libcgroup to
>> automatically place new threads or tasks into a subgroup.
>>
>> I understand that kernel doing it by default should help though. It is
>> less work in terms of configuration. But I am not sure that's a good
>> argument to design kernel functionality. Kernel functionality should be
>> pretty generic.
>>
>> Anyway, how would you assign the weight to the hidden group. What's the
>> interface for that? A new cgroup file inside each cgroup? Personally
>> I think that's little odd interface. Every group has one hidden group
>> where all the queues in that group go and weight of that group can be
>> specified by a cgroup file.
> 
> I think picking a reasonable default weight at compile time is not
> that bad an option, given that threads showing up in the "hidden
> group" is an uncommon case.

Hi Nauman,

Later, I think we might adjust the weight of "hidden group" automatically
according to the queue number and subgroup number and their weight.
But for the time being, i'd choose a fixed value for the sake of simplicity.

Gui

> 
>> But anyway, I am not tied to any of the approach. I am just trying to
>> make sure that we have put enough thought into it as changing it later
>> will be hard.
>>
>> Vivek
>>
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/