linux-kernel - More performance numbers (Was: Re: IO scheduler based IO controller V10)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20091008044251.GA3490@redhat.com>
Date:	Thu, 8 Oct 2009 00:42:51 -0400
From:	Vivek Goyal <vgoyal@...hat.com>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	linux-kernel@...r.kernel.org, jens.axboe@...cle.com,
	containers@...ts.linux-foundation.org, dm-devel@...hat.com,
	nauman@...gle.com, dpshah@...gle.com, lizf@...fujitsu.com,
	mikew@...gle.com, fchecconi@...il.com, paolo.valente@...more.it,
	ryov@...inux.co.jp, fernando@....ntt.co.jp, s-uchida@...jp.nec.com,
	taka@...inux.co.jp, guijianfeng@...fujitsu.com, jmoyer@...hat.com,
	dhaval@...ux.vnet.ibm.com, balbir@...ux.vnet.ibm.com,
	righi.andrea@...il.com, m-ikeda@...jp.nec.com, agk@...hat.com,
	peterz@...radead.org, jmarchan@...hat.com,
	torvalds@...ux-foundation.org, mingo@...e.hu, riel@...hat.com
Subject: More performance numbers (Was: Re: IO scheduler based IO
	controller V10)

On Thu, Sep 24, 2009 at 02:33:15PM -0700, Andrew Morton wrote:
[..]
> > 
> > Testing
> > =======
> > 
> > Environment
> > ==========
> > A 7200 RPM SATA drive with queue depth of 31. Ext3 filesystem.
> 
> That's a bit of a toy.
> 
> Do we have testing results for more enterprisey hardware?  Big storage
> arrays?  SSD?  Infiniband?  iscsi?  nfs? (lol, gotcha)
> 
> 

Hi Andrew,

I got hold of a relatively more enterprisey stuff. It is an storage array
with few striped disks(I think 4 or 5). So this is not high end stuff but
better than my single SATA disk. I guess may be entry level enterprisy stuff.
Still trying to get hold of higher end configuration...

Apart from IO scheduler controller number, I also got a chance to run same
tests with dm-ioband controller. I am posting these too. I am also
planning to run similar numbers on Andrea's "max bw" controller also.
Should be able to post those numbers also in 2-3 days.

Software Environment
====================
- 2.6.31 kernel
- V10 of IO scheduler based controller
- version v1.14.0 of dm-ioband patches 

Used fio jobs for 30 seconds in various configurations. All the IO is
direct IO to eliminate the effects of caches.

I have run three sets for each test. Blindly reporting results of set2
from each test, otherwise it is too much of data to report.

Had lun of 2500GB capacity. Used 200G partitions with ext3 file system for my
testing. For IO scheduler based controller patches, created two cgroups of
weight 100 each doing IO to single 200G partition.

For dm-ioband, created two partitions of 200G each and created two ioband
devices of weight 100 each with policy "weight-iosize". Ideally I should
haved used cgroups on dm-ioband also but could not get cgroup patch going.
Because this is striped configuration, not expecting any major changes in
results due to that.  

Sequential reader vs Random  reader
==================================
Launched on random reader in one group and launched increasing number of
sequential readers in other group to see the effect on latency and
bandwidth of random reader.

[fio1 --rw=read --bs=4K --size=2G --runtime=30 --direct=1 ]
[fio2 --rw=randread --bs=4K --size=1G --runtime=30 --direct=1 --group_reporting]

Vanilla CFQ
-----------
[Sequential readers]                          [Random Reader]           
nr  Max-bandw Min-bandw Agg-bandw Max-latency nr  Agg-bandw Max-latency 
1   13806KB/s 13806KB/s 13483KB/s 28672 usec  1   23KB/s    212 msec    
2   6406KB/s  6268KB/s  12378KB/s 128K usec   1   10KB/s    453 msec    
4   3934KB/s  2536KB/s  13103KB/s 321K usec   1   6KB/s     847 msec    
8   1934KB/s  556KB/s   13009KB/s 876K usec   1   13KB/s    1632 msec   
16  958KB/s   280KB/s   13761KB/s 1621K usec  1   10KB/s    3217 msec   
32  512KB/s   126KB/s   13861KB/s 3241K usec  1   6KB/s     3249 msec   

IO scheduler controller + CFQ
-----------------------------
[Sequential readers]                          [Random Reader]           
nr  Max-bandw Min-bandw Agg-bandw Max-latency nr  Agg-bandw Max-latency 
1   5651KB/s  5651KB/s  5519KB/s  126K usec   1   222KB/s   130K usec   
2   3144KB/s  1479KB/s  4515KB/s  347K usec   1   225KB/s   189K usec   
4   1852KB/s  626KB/s   5128KB/s  775K usec   1   224KB/s   159K usec   
8   971KB/s   279KB/s   6464KB/s  1666K usec  1   222KB/s   193K usec   
16  454KB/s   129KB/s   6293KB/s  3356K usec  1   218KB/s   466K usec   
32  239KB/s   42KB/s    5986KB/s  6753K usec  1   214KB/s   503K usec   

Notes: 
- The BW and latency of random reader are fairly stable in the face of
  increasing number of sequential readers. There are couple of spikes
  in latency, i guess comes from the hardware somehow. But will debug
  more to make sure that I am not delaying in dispatch of request. 

dm-ioaband + CFQ
----------------
[Sequential readers]                          [Random Reader]           
nr  Max-bandw Min-bandw Agg-bandw Max-latency nr  Agg-bandw Max-latency 
1   12466KB/s 12466KB/s 12174KB/s 40078 usec  1   37KB/s    221 msec    
2   6240KB/s  5904KB/s  11859KB/s 134K usec   1   12KB/s    443 msec    
4   3517KB/s  2529KB/s  12368KB/s 357K usec   1   6KB/s     772 msec    
8   1779KB/s  594KB/s   9857KB/s  719K usec   1   60KB/s    852K usec   
16  914KB/s   300KB/s   10934KB/s 1467K usec  1   40KB/s    1285K usec  
32  589KB/s   187KB/s   11537KB/s 3547K usec  1   14KB/s    3228 msec   

Notes:
- Does not look like we provide fairness to random reader here. Latencies
  are on the rise and BW is on the decline. this is almost like Vanilla
  CFQ with reduced overall throughput.

- dm-ioband claims that they do not provide fairness for slow moving group
  and I think it is a bad idea. This leads to very weak isolation with
  no benefits. Especially if a buffered writer is running in other group.
  This should be fixed.

Random writers vs Random reader
================================
[fio1 --rw=randwrite --bs=64K --size=2G --runtime=30 --ioengine=libaio --iodepth=4 --direct=1 ]
[fio2 --rw=randread --bs=4K --size=1G --runtime=30 --direct=1 --group_reporting]

Vanilla CFQ
-----------
[Random Writers]                              [Random Reader]           
nr  Max-bandw Min-bandw Agg-bandw Max-latency nr  Agg-bandw Max-latency 
1   67785KB/s 67785KB/s 66197KB/s 45499 usec  1   170KB/s   94098 usec  
2   35163KB/s 35163KB/s 68678KB/s 218K usec   1   75KB/s    2335 msec   
4   17759KB/s 15308KB/s 64206KB/s 2387K usec  1   85KB/s    2331 msec   
8   8725KB/s  6495KB/s  57120KB/s 3761K usec  1   67KB/s    2488K usec  
16  3912KB/s  3456KB/s  57121KB/s 1273K usec  1   60KB/s    1668K usec  
32  2020KB/s  1503KB/s  56786KB/s 4221K usec  1   39KB/s    1101 msec   

IO scheduler controller + CFQ
-----------------------------
[Random Writers]                              [Random Reader]           
nr  Max-bandw Min-bandw Agg-bandw Max-latency nr  Agg-bandw Max-latency 
1   20919KB/s 20919KB/s 20428KB/s 288K usec   1   213KB/s   580K usec   
2   14765KB/s 14674KB/s 28749KB/s 776K usec   1   203KB/s   112K usec   
4   7177KB/s  7091KB/s  27839KB/s 970K usec   1   197KB/s   132K usec   
8   3027KB/s  2953KB/s  23285KB/s 3145K usec  1   218KB/s   203K usec   
16  1959KB/s  1750KB/s  28919KB/s 1266K usec  1   160KB/s   182K usec   
32  908KB/s   753KB/s   26267KB/s 2091K usec  1   208KB/s   144K usec   

Notes:
- Again disk time has been divided half and half between random reader
  group and random writer group. Fairly stable BW and latencies for
  random reader in the face of increasing number of random writers.

- Drop in aggregate bw of random writers is expected as they now get only
  half of disk time.

dm-ioaband + CFQ
----------------
[Random Writers]                              [Random Reader]           
nr  Max-bandw Min-bandw Agg-bandw Max-latency nr  Agg-bandw Max-latency 
1   63659KB/s 63659KB/s 62167KB/s 89954 usec  1   164KB/s   72 msec     
2   27109KB/s 27096KB/s 52933KB/s 674K usec   1   140KB/s   2204K usec  
4   16553KB/s 16216KB/s 63946KB/s 694K usec   1   56KB/s    1871 msec   
8   3907KB/s  3347KB/s  28752KB/s 2406K usec  1   226KB/s   2407K usec  
16  2841KB/s  2647KB/s  42334KB/s 870K usec   1   52KB/s    3043 msec   
32  738KB/s   657KB/s   21285KB/s 1529K usec  1   21KB/s    4435 msec   

Notes:
- Again no fairness for random reader. Decreasing BW, increasing latency.
  No isolation in this case.

- I am curious what happened to random writer throughput in case of "32"
  writers. We did not get higher BW for random reader but random writer still
  suffering in throughput for random writer. I can see this for all the
  three sets.

Sequential Readers vs Sequential reader
=======================================
[fio1 --rw=read --bs=4K --size=2G --runtime=30 --direct=1]
[fio2 --rw=read --bs=4K --size=2G --runtime=30 --direct=1]

Vanilla CFQ
-----------
[Sequential Readers]                          [Sequential Reader]       
nr  Max-bandw Min-bandw Agg-bandw Max-latency nr  Agg-bandw Max-latency 
1   6434KB/s  6434KB/s  6283KB/s  107K usec   1   7017KB/s  111K usec   
2   4688KB/s  3284KB/s  7785KB/s  274K usec   1   4541KB/s  218K usec   
4   3365KB/s  1326KB/s  9769KB/s  597K usec   1   3038KB/s  424K usec   
8   1827KB/s  504KB/s   12053KB/s 813K usec   1   1389KB/s  813K usec   
16  1022KB/s  301KB/s   13954KB/s 1618K usec  1   676KB/s   1617K usec  
32  494KB/s   149KB/s   13611KB/s 3216K usec  1   416KB/s   3215K usec  

IO scheduler controller + CFQ
-----------------------------
[Sequential Readers]                          [Sequential Reader]       
nr  Max-bandw Min-bandw Agg-bandw Max-latency nr  Agg-bandw Max-latency 
1   6605KB/s  6605KB/s  6450KB/s  120K usec   1   6527KB/s  120K usec   
2   3706KB/s  1985KB/s  5558KB/s  323K usec   1   6331KB/s  149K usec   
4   2053KB/s  672KB/s   5731KB/s  721K usec   1   6267KB/s  148K usec   
8   1013KB/s  337KB/s   6962KB/s  1525K usec  1   6136KB/s  120K usec   
16  497KB/s   125KB/s   6873KB/s  3226K usec  1   5882KB/s  113K usec   
32  297KB/s   48KB/s    6445KB/s  6394K usec  1   5767KB/s  116K usec   

Notes:
- Stable BW and lateneis for sequential reader in the face of increasing
  number of readers in other group.

dm-ioaband + CFQ
----------------
[Sequential Readers]                          [Sequential Reader]       
nr  Max-bandw Min-bandw Agg-bandw Max-latency nr  Agg-bandw Max-latency 
1   7140KB/s  7140KB/s  6972KB/s  112K usec   1   6886KB/s  165K usec   
2   3965KB/s  2762KB/s  6569KB/s  479K usec   1   5887KB/s  475K usec   
4   2725KB/s  1483KB/s  7999KB/s  532K usec   1   4774KB/s  500K usec   
8   1610KB/s  621KB/s   9565KB/s  729K usec   1   2910KB/s  677K usec   
16  904KB/s   319KB/s   10809KB/s 1431K usec  1   1970KB/s  1399K usec  
32  553KB/s   8KB/s     11794KB/s 2330K usec  1   1337KB/s  2398K usec  

Notes:
- Decreasing throughput and increasing latencies for sequential reader.
  Hence no isolation in this case.

- Also note the in case of "32" readers, difference between "max-bw" and
  "min-bw" is relatively large, considering that all the 32 readers are
  of same prio. So bw distribution with-in group is not very good. This is
  the issue of ioprio with-in group I have pointed many times. Ryo is
  looking into it now.

Sequential Readers vs Multiple Random Readers
=======================================
Ok, because dm-ioband does not provide fairness in case if heavy IO
activity is not going in the group, I decided to run a slightly different
test case where 16 sequential readers are running in one group and I
run increasing number of random readers in other group to see when do
I start getting fairness and its effect.

[fio1 --rw=read --bs=4K --size=2G --runtime=30 --direct=1 ]
[fio2 --rw=randread --bs=4K --size=1G --runtime=30 --direct=1 --group_reporting]

Vanilla CFQ
-----------
[Sequential Readers]                          [Multiple Random Readers] 
nr  Max-bandw Min-bandw Agg-bandw Max-latency nr  Agg-bandw Max-latency 
16  961KB/s   280KB/s   13978KB/s 1673K usec  1   10KB/s    3223 msec   
16  903KB/s   260KB/s   12925KB/s 1770K usec  2   28KB/s    3465 msec   
16  832KB/s   231KB/s   11428KB/s 2088K usec  4   57KB/s    3891K usec  
16  765KB/s   187KB/s   9899KB/s  2500K usec  8   99KB/s    3937K usec  
16  512KB/s   144KB/s   6759KB/s  3451K usec  16  148KB/s   5470K usec  

IO scheduler controller + CFQ
-----------------------------
[Sequential Readers]                          [Multiple Random Readers] 
nr  Max-bandw Min-bandw Agg-bandw Max-latency nr  Agg-bandw Max-latency 
16  456KB/s   112KB/s   6380KB/s  3361K usec  1   221KB/s   503K usec   
16  476KB/s   159KB/s   6040KB/s  3432K usec  2   214KB/s   549K usec   
16  606KB/s   178KB/s   6052KB/s  3801K usec  4   177KB/s   1341K usec  
16  589KB/s   83KB/s    6243KB/s  3394K usec  8   154KB/s   3288K usec  
16  547KB/s   122KB/s   6122KB/s  3538K usec  16  145KB/s   5959K usec  

Notes:
- Stable BW and latencies for sequential reader group in the face of
  increasing number of random readers in other group.

- Because disk is divided half/half in terms of time, random reader group
  also gets decent amount of job done. Not sure why BW dips a bit when 
  number of random readers increases. Too seeky to handle?
 
dm-ioaband + CFQ
----------------
[Sequential Readers]                          [Multiple Random Readers] 
nr  Max-bandw Min-bandw Agg-bandw Max-latency nr  Agg-bandw Max-latency 
16  926KB/s   293KB/s   10256KB/s 1634K usec  1   55KB/s    1377K usec  
16  906KB/s   284KB/s   9240KB/s  1825K usec  2   71KB/s    2392K usec  
16  321KB/s   18KB/s    1621KB/s  2037K usec  4   326KB/s   2054K usec  
16  188KB/s   16KB/s    1188KB/s  9757K usec  8   404KB/s   3269K usec  
16  167KB/s   64KB/s    1700KB/s  2859K usec  16  1064KB/s  2920K usec  

Notes:
- Looks like ioband tried to provide fairness from the time when number of
  random readers are 4. Note, there is sudden increase in BW of random
  readers and drastic drop in BW of sequential readers.

- By the time number of readers reach 16, total array throughput reduces
  to around 2.7 MB/s. It got killed because suddenly we are trying to
  provide fairness in terms of size of IO. That's why on seeky media
  fairness in terms of disk time works better.

- There is no isolation between groups. Throughput of sequential reader
  group continues to drop and latencies rise.

- I think these are serious issues which should be looked into and fixed.  

Thanks
Vivek

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/