lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1281566667-7821-1-git-send-email-vgoyal@redhat.com>
Date:	Wed, 11 Aug 2010 18:44:22 -0400
From:	Vivek Goyal <vgoyal@...hat.com>
To:	linux-kernel@...r.kernel.org, jaxboe@...ionio.com
Cc:	vgoyal@...hat.com
Subject: [PATCH] cfq-iosched: cfq-iosched: Implement group idling and IOPS accounting for groups V4 


Hi,

This is V4 of the patches for group_idle and CFQ group charge accounting in
terms of IOPS implementation. Since V3 not much has changed. Just more testing
and rebase on top of for-2.6.36 branch of block tree.

What's the problem
------------------
On high end storage (I got on HP EVA storage array with 12 SATA disks in 
RAID 5), CFQ's model of dispatching requests from a single queue at a
time (sequential readers/write sync writers etc), becomes a bottleneck.
Often we don't drive enough request queue depth to keep all the disks busy
and suffer a lot in terms of overall throughput.

All these problems primarily originate from two things. Idling on per
cfq queue and quantum (dispatching limited number of requests from a
single queue) and till then not allowing dispatch from other queues. Once
you set the slice_idle=0 and quantum to higher value, most of the CFQ's
problem on higher end storage disappear.

This problem also becomes visible in IO controller where one creates
multiple groups and gets the fairness but overall throughput is less. In
the following table, I am running increasing number of sequential readers
(1,2,4,8) in 8 groups of weight 100 to 800.

Kernel=2.6.35-blktree-group_idle+
GROUPMODE=1          NRGRP=8      DEV=/dev/dm-3                 
Workload=bsr      iosched=cfq     Filesz=512M bs=4K   
gi=1  slice_idle=8    group_idle=8    quantum=8
=========================================================================
AVERAGE[bsr]    [bw in KB/s]    
------- 
job     Set NR  cgrp1  cgrp2  cgrp3  cgrp4  cgrp5  cgrp6  cgrp7  cgrp8  total  
---     --- --  ---------------------------------------------------------------
bsr     1   1   6519   12742  16801  23109  28694  35988  43175  49272  216300 
bsr     1   2   5522   10922  17174  22554  24151  30488  36572  42021  189404 
bsr     1   4   4593   9620   13120  21405  25827  28097  33029  37335  173026 
bsr     1   8   3622   8277   12557  18296  21775  26022  30760  35713  157022 


Notice that overall throughput is just around 160MB/s with 8 sequential reader
in each group.

With this patch set, I have set slice_idle=0 and re-ran same test.

Kernel=2.6.35-blktree-group_idle+
GROUPMODE=1          NRGRP=8         DEV=/dev/dm-3                 
Workload=bsr      iosched=cfq     Filesz=512M bs=4K   
gi=1  slice_idle=0    group_idle=8    quantum=8
=========================================================================
AVERAGE[bsr]    [bw in KB/s]    
------- 
job     Set NR  cgrp1  cgrp2  cgrp3  cgrp4  cgrp5  cgrp6  cgrp7  cgrp8  total  
---     --- --  ---------------------------------------------------------------
bsr     1   1   6652   12341  17335  23856  28740  36059  42833  48487  216303 
bsr     1   2   10168  20292  29827  38363  45746  52842  60071  63957  321266 
bsr     1   4   11176  21763  32713  42970  53222  58613  63598  69296  353351 
bsr     1   8   11750  23718  34102  47144  56975  63613  69000  69666  375968 

Notice how overall throughput has shot upto 350-370MB/s while retaining the
ability to do the IO control.

So this is not the default mode. This new tunable group_idle, allows one to
set slice_idle=0 to disable some of the CFQ features and and use primarily
group service differentation feature.

By default nothing should change for CFQ and this change should be fairly
low risk.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ