[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4C48DA16.4010403@cn.fujitsu.com>
Date: Fri, 23 Jul 2010 07:53:58 +0800
From: Gui Jianfeng <guijianfeng@...fujitsu.com>
To: Vivek Goyal <vgoyal@...hat.com>
CC: linux-kernel@...r.kernel.org, axboe@...nel.dk, nauman@...gle.com,
dpshah@...gle.com, jmoyer@...hat.com, czoccolo@...il.com
Subject: Re: [RFC PATCH] cfq-iosced: Implement IOPS mode and group_idle tunable
V3
Vivek Goyal wrote:
> On Thu, Jul 22, 2010 at 03:08:00PM +0800, Gui Jianfeng wrote:
>> Vivek Goyal wrote:
>>> Hi,
>>>
>>> This is V3 of the group_idle and CFQ IOPS mode implementation patchset. Since V2
>>> I have cleaned up the code a bit to clarify the confusion lingering around in
>>> what cases do we charge time slice and in what cases do we charge number of
>>> requests.
>>>
>>> What's the problem
>>> ------------------
>>> On high end storage (I got on HP EVA storage array with 12 SATA disks in
>>> RAID 5), CFQ's model of dispatching requests from a single queue at a
>>> time (sequential readers/write sync writers etc), becomes a bottleneck.
>>> Often we don't drive enough request queue depth to keep all the disks busy
>>> and suffer a lot in terms of overall throughput.
>>>
>>> All these problems primarily originate from two things. Idling on per
>>> cfq queue and quantum (dispatching limited number of requests from a
>>> single queue) and till then not allowing dispatch from other queues. Once
>>> you set the slice_idle=0 and quantum to higher value, most of the CFQ's
>>> problem on higher end storage disappear.
>>>
>>> This problem also becomes visible in IO controller where one creates
>>> multiple groups and gets the fairness but overall throughput is less. In
>>> the following table, I am running increasing number of sequential readers
>>> (1,2,4,8) in 8 groups of weight 100 to 800.
>>>
>>> Kernel=2.6.35-rc5-iops+
>>> GROUPMODE=1 NRGRP=8
>>> DIR=/mnt/iostestmnt/fio DEV=/dev/dm-4
>>> Workload=bsr iosched=cfq Filesz=512M bs=4K
>>> group_isolation=1 slice_idle=8 group_idle=8 quantum=8
>>> =========================================================================
>>> AVERAGE[bsr] [bw in KB/s]
>>> -------
>>> job Set NR cgrp1 cgrp2 cgrp3 cgrp4 cgrp5 cgrp6 cgrp7 cgrp8 total
>>> --- --- -- ---------------------------------------------------------------
>>> bsr 3 1 6186 12752 16568 23068 28608 35785 42322 48409 213701
>>> bsr 3 2 5396 10902 16959 23471 25099 30643 37168 42820 192461
>>> bsr 3 4 4655 9463 14042 20537 24074 28499 34679 37895 173847
>>> bsr 3 8 4418 8783 12625 19015 21933 26354 29830 36290 159249
>>>
>>> Notice that overall throughput is just around 160MB/s with 8 sequential reader
>>> in each group.
>>>
>>> With this patch set, I have set slice_idle=0 and re-ran same test.
>>>
>>> Kernel=2.6.35-rc5-iops+
>>> GROUPMODE=1 NRGRP=8
>>> DIR=/mnt/iostestmnt/fio DEV=/dev/dm-4
>>> Workload=bsr iosched=cfq Filesz=512M bs=4K
>>> group_isolation=1 slice_idle=0 group_idle=8 quantum=8
>>> =========================================================================
>>> AVERAGE[bsr] [bw in KB/s]
>>> -------
>>> job Set NR cgrp1 cgrp2 cgrp3 cgrp4 cgrp5 cgrp6 cgrp7 cgrp8 total
>>> --- --- -- ---------------------------------------------------------------
>>> bsr 3 1 6523 12399 18116 24752 30481 36144 42185 48894 219496
>>> bsr 3 2 10072 20078 29614 38378 46354 52513 58315 64833 320159
>>> bsr 3 4 11045 22340 33013 44330 52663 58254 63883 70990 356520
>>> bsr 3 8 12362 25860 37920 47486 61415 47292 45581 70828 348747
>>>
>>> Notice how overall throughput has shot upto 348MB/s while retaining the ability
>>> to do the IO control.
>>>
>>> So this is not the default mode. This new tunable group_idle, allows one to
>>> set slice_idle=0 to disable some of the CFQ features and and use primarily
>>> group service differentation feature.
>>>
>>> If you have thoughts on other ways of solving the problem, I am all ears
>>> to it.
>> Hi Vivek
>>
>> Would you attach your fio job config file?
>>
>
> Hi Gui,
>
> I have written a fio based test script, "iostest", to be able to
> do cgroup and other IO scheduler testing more smoothly and I am using
> that. I am attaching the compressed script with the mail. Try using it
> and if it works for you and you find it useful, I can think of hosting a
> git tree somewhere.
>
> I used following following command lines to test above.
>
> # iostest <block-device> -G -w bsr -m 8 -c --nrgrp 8 --total
>
> With slice idle disabled.
>
> # iostest <block-device> -G -w bsr -m 8 -c --nrgrp 8 --total -I 0
That's cool! Very helpful, I'll try it.
Thanks,
Gui
>
> Thanks
> Vivek
--
Regards
Gui Jianfeng
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists