linux-kernel - Re: [RFC PATCH] cfq-iosced: Implement IOPS mode and group

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20100722144931.GD28684@redhat.com>
Date:	Thu, 22 Jul 2010 10:49:31 -0400
From:	Vivek Goyal <vgoyal@...hat.com>
To:	Gui Jianfeng <guijianfeng@...fujitsu.com>
Cc:	linux-kernel@...r.kernel.org, axboe@...nel.dk, nauman@...gle.com,
	dpshah@...gle.com, jmoyer@...hat.com, czoccolo@...il.com
Subject: Re: [RFC PATCH] cfq-iosced: Implement IOPS mode and group_idle
 tunable V3

On Thu, Jul 22, 2010 at 03:08:00PM +0800, Gui Jianfeng wrote:
> Vivek Goyal wrote:
> > Hi,
> > 
> > This is V3 of the group_idle and CFQ IOPS mode implementation patchset. Since V2
> > I have cleaned up the code a bit to clarify the confusion lingering around in
> > what cases do we charge time slice and in what cases do we charge number of
> > requests.
> > 
> > What's the problem
> > ------------------
> > On high end storage (I got on HP EVA storage array with 12 SATA disks in 
> > RAID 5), CFQ's model of dispatching requests from a single queue at a
> > time (sequential readers/write sync writers etc), becomes a bottleneck.
> > Often we don't drive enough request queue depth to keep all the disks busy
> > and suffer a lot in terms of overall throughput.
> > 
> > All these problems primarily originate from two things. Idling on per
> > cfq queue and quantum (dispatching limited number of requests from a
> > single queue) and till then not allowing dispatch from other queues. Once
> > you set the slice_idle=0 and quantum to higher value, most of the CFQ's
> > problem on higher end storage disappear.
> > 
> > This problem also becomes visible in IO controller where one creates
> > multiple groups and gets the fairness but overall throughput is less. In
> > the following table, I am running increasing number of sequential readers
> > (1,2,4,8) in 8 groups of weight 100 to 800.
> > 
> > Kernel=2.6.35-rc5-iops+
> > GROUPMODE=1          NRGRP=8
> > DIR=/mnt/iostestmnt/fio        DEV=/dev/dm-4
> > Workload=bsr      iosched=cfq     Filesz=512M bs=4K
> > group_isolation=1 slice_idle=8    group_idle=8    quantum=8
> > =========================================================================
> > AVERAGE[bsr]    [bw in KB/s]
> > -------
> > job     Set NR  cgrp1  cgrp2  cgrp3  cgrp4  cgrp5  cgrp6  cgrp7  cgrp8  total
> > ---     --- --  ---------------------------------------------------------------
> > bsr     3   1   6186   12752  16568  23068  28608  35785  42322  48409  213701
> > bsr     3   2   5396   10902  16959  23471  25099  30643  37168  42820  192461
> > bsr     3   4   4655   9463   14042  20537  24074  28499  34679  37895  173847
> > bsr     3   8   4418   8783   12625  19015  21933  26354  29830  36290  159249
> > 
> > Notice that overall throughput is just around 160MB/s with 8 sequential reader
> > in each group.
> > 
> > With this patch set, I have set slice_idle=0 and re-ran same test.
> > 
> > Kernel=2.6.35-rc5-iops+
> > GROUPMODE=1          NRGRP=8
> > DIR=/mnt/iostestmnt/fio        DEV=/dev/dm-4
> > Workload=bsr      iosched=cfq     Filesz=512M bs=4K
> > group_isolation=1 slice_idle=0    group_idle=8    quantum=8
> > =========================================================================
> > AVERAGE[bsr]    [bw in KB/s]
> > -------
> > job     Set NR  cgrp1  cgrp2  cgrp3  cgrp4  cgrp5  cgrp6  cgrp7  cgrp8  total
> > ---     --- --  ---------------------------------------------------------------
> > bsr     3   1   6523   12399  18116  24752  30481  36144  42185  48894  219496
> > bsr     3   2   10072  20078  29614  38378  46354  52513  58315  64833  320159
> > bsr     3   4   11045  22340  33013  44330  52663  58254  63883  70990  356520
> > bsr     3   8   12362  25860  37920  47486  61415  47292  45581  70828  348747
> > 
> > Notice how overall throughput has shot upto 348MB/s while retaining the ability
> > to do the IO control.
> > 
> > So this is not the default mode. This new tunable group_idle, allows one to
> > set slice_idle=0 to disable some of the CFQ features and and use primarily
> > group service differentation feature.
> > 
> > If you have thoughts on other ways of solving the problem, I am all ears
> > to it.
> 
> Hi Vivek
> 
> Would you attach your fio job config file?
> 

Hi Gui,

I have written a fio based test script, "iostest", to be able to
do cgroup and other IO scheduler testing more smoothly and I am using
that. I am attaching the compressed script with the mail. Try using it
and if it works for you and you find it useful, I can think of hosting a
git tree somewhere.

I used following following command lines to test above.

# iostest <block-device> -G -w bsr -m 8 -c --nrgrp 8 --total

With slice idle disabled.

# iostest <block-device> -G -w bsr -m 8 -c --nrgrp 8 --total -I 0

Thanks
Vivek

Download attachment "iostest.tar.gz" of type "application/x-gzip" (20858 bytes)