[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <49FF2373.7040905@ds.jp.nec.com>
Date: Mon, 04 May 2009 13:18:43 -0400
From: "IKEDA, Munehiro" <m-ikeda@...jp.nec.com>
To: Nauman Rafique <nauman@...gle.com>
CC: Vivek Goyal <vgoyal@...hat.com>,
Balbir Singh <balbir@...ux.vnet.ibm.com>, oz-kernel@...hat.com,
paolo.valente@...more.it, linux-kernel@...r.kernel.org,
dhaval@...ux.vnet.ibm.com, containers@...ts.linux-foundation.org,
menage@...gle.com, jmoyer@...hat.com, fchecconi@...il.com,
arozansk@...hat.com, jens.axboe@...cle.com,
akpm@...ux-foundation.org, fernando@...ellilink.co.jp,
Andrea Righi <righi.andrea@...il.com>,
Ryo Tsuruta <ryov@...inux.co.jp>,
Divyesh Shah <dpshah@...gle.com>,
Gui Jianfeng <guijianfeng@...fujitsu.com>
Subject: Re: IO Controller per cgroup request descriptors (Re: [PATCH 01/10]
Documentation)
Nauman Rafique wrote:
> On Fri, May 1, 2009 at 3:45 PM, Vivek Goyal <vgoyal@...hat.com> wrote:
>> On Fri, May 01, 2009 at 06:04:39PM -0400, IKEDA, Munehiro wrote:
>>> Vivek Goyal wrote:
>>>>>> +TODO
>>>>>> +====
>>>>>> +- Lots of cleanups, testing, bug fixing, optimizations, benchmarking etc...
>>>>>> +- Convert cgroup ioprio to notion of weight.
>>>>>> +- Anticipatory code will need more work. It is not working properly currently
>>>>>> + and needs more thought.
>>>>> What are the problems with the code?
>>>> Have not got a chance to look into the issues in detail yet. Just a crude run
>>>> saw drop in performance. Will debug it later the moment I have got async writes
>>>> handled...
>>>>
>>>>>> +- Use of bio-cgroup patches.
>>>>> I saw these posted as well
>>>>>
>>>>>> +- Use of Nauman's per cgroup request descriptor patches.
>>>>>> +
>>>>> More details would be nice, I am not sure I understand
>>>> Currently the number of request descriptors which can be allocated per
>>>> device/request queue are fixed by a sysfs tunable (q->nr_requests). So
>>>> if there is lots of IO going on from one cgroup then it will consume all
>>>> the available request descriptors and other cgroup might starve and not
>>>> get its fair share.
>>>>
>>>> Hence we also need to introduce the notion of request descriptor limit per
>>>> cgroup so that if request descriptors from one group are exhausted, then
>>>> it does not impact the IO of other cgroup.
>>> Unfortunately I couldn't find and I've never seen the Nauman's patches.
>>> So I tried to make a patch below against this todo. The reason why
>>> I'm posting this despite this is just a quick and ugly hack (and it
>>> might be a reinvention of wheel) is that I would like to discuss how
>>> we should define the limitation of requests per cgroup.
>>> This patch should be applied on Vivek's I/O controller patches
>>> posted on Mar 11.
>> Hi IKEDA,
>>
>> Sorry for the confusion here. Actually Nauman had sent a patch to select group
>> of people who were initially copied on the mail thread.
>
> I am sorry about that. Since I dropped my whole patch set in favor of
> Vivek's stuff, this stuff fell through the cracks.
No problem at all guys. I'm glad to see your patch Vivek sent, thanks.
>>> This patch temporarily distribute q->nr_requests to each cgroup.
>>> I think the number should be weighted like BFQ's budget. But in
>>> this case, if the hierarchy of cgroup is deep, leaf cgroups are
>>> allowed to allocate very few numbers of requests. I don't think
>>> this is reasonable...but I don't have specific idea to solve this
>>> problem. Does anyone have the good idea?
>>>
>> Thanks for the patch. Yes, ideally one would expect the request descriptor
>> to be allocated also in proportion to the weight but I guess that would
>> become very comlicated.
>>
>> In terms of simpler things, two thoughts come to mind.
>>
>> - First approach is to make q->nr_requests per group. So every group is
>> entitled for q->nr_requests as set by the user. This is what your patch
>> seems to have done.
>>
>> I had some concerns with this approach. First of all it does not seem to
>> have an upper bound on number of request descriptors allocated per queue
>> because if a user creates more cgroups, total number of request
>> descriptors increase.
>>
>> - Second approach can be that we retain the meaning of q->nr_requests
>> which defines the total number of request descriptors on the queue (with
>> the exception of 50% more descriptors for batching processes). And we
>> define a new per group limit q->nr_group_requests which defines how many
>> requests per group can be assigned. So q->nr_requests defines total pool
>> size on the queue and q->nr_group_requests will define how many requests
>> each group can allocate out of that pool.
>>
>> Here the issue is that a user shall have to balance the q->nr_group_requests and q->nr_requests properly.
>>
>> To experiment, I have implemented the second approach. I am attaching the
>> patch which is in my current tree. It probably will not apply on my march
>> 11 posting as since then patches have changed. But posting it here so that
>> at least it will give an idea behind the thought process.
>>
>> Ideas are welcome...
>
> I had started with the first option, but the second option sounds good
> too. But one problem that comes to mind is how we deal with
> hierarchies? The sys admin can limit the root level cgroups to
> specific number of request descriptors, but if applications running in
> a cgroup are allowed to create their own cgroups, then the total
> request descriptors of all child cgroups should be capped by the
> number assigned to parent cgroups.
I think the second option cannot coexist with hierarchy support of
per cgroup request descriptors limitation. I guess the fundamental
idea of the second approach is to make logic simpler by giving up
hierarchy, isn't it correct?
IIUC, for hierarchy support, we need to have some good idea to solve
the issue that a cgroup which belongs to deep hierarchy can have only
few requests, as I mentioned.
Anyway, I keep my eyes on this issue. I'm looking forward to Vivek's
next version patches.
(snip)
>> +#ifdef CONFIG_GROUP_IOSCHED
>> +static ssize_t queue_group_requests_show(struct request_queue *q, char *page)
>> +{
>> + return queue_var_show(q->nr_group_requests, (page));
>> +}
>> +
>> +static ssize_t
>> +queue_group_requests_store(struct request_queue *q, const char *page,
>> + size_t count)
>> +{
>> + unsigned long nr;
>> + int ret = queue_var_store(&nr, page, count);
>> + if (nr < BLKDEV_MIN_RQ)
>> + nr = BLKDEV_MIN_RQ;
>> +
>> + spin_lock_irq(q->queue_lock);
>> + q->nr_group_requests = nr;
>> + spin_unlock_irq(q->queue_lock);
>> + return ret;
>> +}
>> +#endif
(snip)
Unchanging io_context "batching" status is on purpose?
Thanks,
Muuhh
--
IKEDA, Munehiro
NEC Corporation of America
m-ikeda@...jp.nec.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists