linux-kernel - Re: Block IO Controller V4

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1260295541.6686.37.camel@cail>
Date:	Tue, 08 Dec 2009 13:05:41 -0500
From:	"Alan D. Brunelle" <Alan.Brunelle@...com>
To:	Vivek Goyal <vgoyal@...hat.com>
Cc:	Corrado Zoccolo <czoccolo@...il.com>, linux-kernel@...r.kernel.org,
	jens.axboe@...cle.com, nauman@...gle.com, dpshah@...gle.com,
	lizf@...fujitsu.com, ryov@...inux.co.jp, fernando@....ntt.co.jp,
	s-uchida@...jp.nec.com, taka@...inux.co.jp,
	guijianfeng@...fujitsu.com, jmoyer@...hat.com,
	righi.andrea@...il.com, m-ikeda@...jp.nec.com
Subject: Re: Block IO Controller V4

On Tue, 2009-12-08 at 11:32 -0500, Vivek Goyal wrote:
> On Tue, Dec 08, 2009 at 10:17:48AM -0500, Alan D. Brunelle wrote:
> > Hi Vivek - 
> > 
> > Sorry, I've been off doing other work and haven't had time to follow up
> > on this (until recently). I have runs based upon Jens' for-2.6.33 tree
> > as of commit 0d99519efef15fd0cf84a849492c7b1deee1e4b7 and your V4 patch
> > sequence (the refresh patch you sent me on 3 December 2009). I _think_
> > things look pretty darn good.
> 
> That's good to hear. :-)
> 
> >There are three modes compared:
> > 
> > (1) base - just Jens' for-2.6.33 tree, not patched.
> > (2) i1,s8 - Your patches added and slice_idle set to 8 (default)
> > (3) i1,s0 - Your patched added and slice_idle set to 0
> > 
> 
> Thanks Alan. Whenever you run your tests again, it would be better to run
> it against Jens's for-2.6.33 branch as Jens has merged block IO controller
> patches.

Will do another set of runs w/ the straight branch.

> 
> > I did both synchronous and asynchronous runs, direct I/Os in both case,
> > random and sequential, with reads, writes and 80%/20% read/write cases.
> > The results are in throughput (as reported by fio). The first table
> > shows overall test results, the other tables show breakdowns per cgroup
> > (disk).
> 
> What is asynchronous direct sequential read? Reads done through libaio?

Yep - An asynchronous run would have fio job files like:

[global]
size=8g
overwrite=0
runtime=120
ioengine=libaio
iodepth=128
iodepth_low=128
iodepth_batch=128
iodepth_batch_complete=32
direct=1
bs=4k
readwrite=randread
[/mnt/sda/data.0]
filename=/mnt/sda/data.0

The equivalent synchronous run would be:

[global]
size=8g
overwrite=0
runtime=120
ioengine=sync
direct=1
bs=4k
readwrite=randread
[/mnt/sda/data.0]
filename=/mnt/sda/data.0

~                         
> 
> Few thoughts/questions inline.
> 
> > 
> > Regards,
> > Alan
> > 
> 
> I am assuming that purpose of following table is to see what is the
> overhead of IO controller patches. If yes, this looks more or less
> good except there is slight dip in as seq rd case.
> 
> > ---- ---- - --------- --------- --------- --------- --------- ---------
> > Mode RdWr N  as,base  as,i1,s8  as,i1,s0   sy,base  sy,i1,s8  sy,i1,s0
> > ---- ---- - --------- --------- --------- --------- --------- ---------
> > rnd  rd   2      39.7      39.1      43.7      20.5      20.5      20.4
> > rnd  rd   4      33.9      33.3      41.2      28.5      28.5      28.5
> > rnd  rd   8      23.7      25.0      36.7      34.4      34.5      34.6
> > 
> 
> slice_idle=0 improves throughput for "as" case. That's interesting.
> Especially in case of 8 random readers running. Well that should be a
> general CFQ property and not effect of group IO control.
> 
> I am not sure, why did you not capture base with slice_idle=0 mode so that
> apple vs apple comaprison could be done.

Could add that...will add that...

> 
> 
> > rnd  wr   2      66.1      67.8      68.9      71.8      71.8      71.9
> > rnd  wr   4      57.8      62.9      66.1      64.1      64.2      64.3
> > rnd  wr   8      39.5      47.4      60.6      54.7      54.6      54.9
> > 
> > rnd  rdwr 2      50.2      49.1      54.5      31.1      31.1      31.1
> > rnd  rdwr 4      41.4      41.3      50.9      38.9      39.1      39.6
> > rnd  rdwr 8      28.1      30.5      46.3      42.5      42.6      43.8
> > 
> > seq  rd   2     612.3     605.7     611.2     509.6     528.3     608.6
> > seq  rd   4     614.1     606.9     606.2     493.0     490.6     615.4
> > seq  rd   8     613.6     603.8     605.9     453.0     461.8     617.6
> > 
> 
> Not sure where does this 1-2% dip in as seq read comes from.
> 
> 
> > seq  wr   2     694.6     726.1     701.2     685.8     661.8     314.2
> > seq  wr   4     687.6     715.3     628.3     702.9     702.3     317.8
> > seq  wr   8     695.0     710.0     629.8     704.0     708.3     339.4
> > 
> > seq  rdwr 2     692.3     664.9     693.8     508.4     504.0     642.8
> > seq  rdwr 4     664.5     657.1     639.3     484.5     481.0     694.3
> > seq  rdwr 8     659.0     648.0     634.4     458.1     460.4     709.6
> > 
> > ===============================================================
> > 
> > ----------- ---- ---- - ----- ----- ----- ----- ----- ----- ----- -----
> > Test        Mode RdWr N test0 test1 test2 test3 test4 test5 test6 test7
> > ----------- ---- ---- - ----- ----- ----- ----- ----- ----- ----- -----
> > as,base     rnd  rd   2  20.0  19.7
> > as,base     rnd  rd   4   8.8   8.5   8.3   8.3
> > as,base     rnd  rd   8   3.3   3.1   3.3   3.2   2.7   2.7   2.8   2.6
> > 
> > as,base     rnd  wr   2  33.2  32.9
> > as,base     rnd  wr   4  15.9  15.2  14.5  12.3
> > as,base     rnd  wr   8   5.8   3.4   7.8   8.7   3.5   3.4   3.8   3.1
> > 
> > as,base     rnd  rdwr 2  25.0  25.2
> > as,base     rnd  rdwr 4  10.6  10.4  10.2  10.2
> > as,base     rnd  rdwr 8   3.7   3.6   4.0   4.1   3.2   3.4   3.3   2.9
> > 
> > 
> > as,base     seq  rd   2 305.9 306.4
> > as,base     seq  rd   4 159.4 160.5 147.3 146.9
> > as,base     seq  rd   8  79.7  80.0  77.3  78.4  73.0  70.0  77.5  77.7
> > 
> > as,base     seq  wr   2 348.6 346.0
> > as,base     seq  wr   4 189.9 187.6 154.7 155.3
> > as,base     seq  wr   8  87.9  88.3  84.7  85.3  84.5  85.1  90.4  88.8
> > 
> > as,base     seq  rdwr 2 347.2 345.1
> > as,base     seq  rdwr 4 181.6 181.8 150.8 150.2
> > as,base     seq  rdwr 8  83.6  82.1  82.1  82.7  80.6  82.7  82.2  82.9
> > 
> > ----------- ---- ---- - ----- ----- ----- ----- ----- ----- ----- -----
> > Test        Mode RdWr N test0 test1 test2 test3 test4 test5 test6 test7
> > ----------- ---- ---- - ----- ----- ----- ----- ----- ----- ----- -----
> > as,i1,s8    rnd  rd   2  12.7  26.3
> > as,i1,s8    rnd  rd   4   1.2   3.7  12.2  16.3
> > as,i1,s8    rnd  rd   8   0.5   0.8   1.2   1.7   2.1   3.5   6.7   8.4
> > 
> 
> This looks more or less good except the fact that last two groups seem to
> have got much more share of disk. In general it would be nice to also
> capture the disk time also apart from BW.

What specifically are you looking for? Any other fields from the fio
output? I have all that data & could reprocess it easily enough. 

> 
> > as,i1,s8    rnd  wr   2  18.5  49.3
> > as,i1,s8    rnd  wr   4   1.0   1.6  20.7  39.6
> > as,i1,s8    rnd  wr   8   0.5   0.7   0.9   1.2   1.7   2.5  15.5  24.5
> > 
> 
> Same as random read. Last two group got much more BW than their share. Can
> you send me your exact fio command you used to run async workload. I would
> like to try it out on my system and see what's happenig.
> 
> > as,i1,s8    rnd  rdwr 2  16.2  32.9
> > as,i1,s8    rnd  rdwr 4   1.2   4.7  15.6  19.9
> > as,i1,s8    rnd  rdwr 8   0.6   0.8   1.1   1.7   2.1   3.4   9.4  11.5
> > 
> > as,i1,s8    seq  rd   2 202.7 403.0
> > as,i1,s8    seq  rd   4  92.1 114.7 182.4 217.6
> > as,i1,s8    seq  rd   8  38.7  76.2  74.0  73.9  74.5  74.7  84.7 107.0
> > 
> > as,i1,s8    seq  wr   2 243.8 482.3
> > as,i1,s8    seq  wr   4 107.7 155.5 200.4 251.7
> > as,i1,s8    seq  wr   8  52.1  77.2  81.9  80.8  89.6  99.9 109.8 118.7
> > 
> 
> We do see increasing BW in case of async seq rd and seq wr but again is
> not very proportionate to weights. Again disk time will help here.
> 
> > as,i1,s8    seq  rdwr 2 225.8 439.1
> > as,i1,s8    seq  rdwr 4 103.2 140.2 186.5 227.2
> > as,i1,s8    seq  rdwr 8  50.3  77.4  77.5  78.9  80.5  83.9  94.3 105.2
> > 
> > ----------- ---- ---- - ----- ----- ----- ----- ----- ----- ----- -----
> > Test        Mode RdWr N test0 test1 test2 test3 test4 test5 test6 test7
> > ----------- ---- ---- - ----- ----- ----- ----- ----- ----- ----- -----
> > as,i1,s0    rnd  rd   2  21.9  21.8
> > as,i1,s0    rnd  rd   4  11.4  12.0   9.1   8.7
> > as,i1,s0    rnd  rd   8   3.2   3.2   6.7   6.7   4.7   4.0   4.7   3.5
> > 
> > as,i1,s0    rnd  wr   2  34.5  34.4
> > as,i1,s0    rnd  wr   4  21.6  20.5  12.6  11.4
> > as,i1,s0    rnd  wr   8   5.1   4.8  18.2  16.9   4.1   4.0   4.0   3.3
> > 
> > as,i1,s0    rnd  rdwr 2  27.5  27.0
> > as,i1,s0    rnd  rdwr 4  16.1  15.4  10.2   9.2
> > as,i1,s0    rnd  rdwr 8   5.3   4.6   9.9   9.7   4.6   4.0   4.4   3.8
> > 
> > as,i1,s0    seq  rd   2 305.5 305.6
> > as,i1,s0    seq  rd   4 159.5 157.3 144.1 145.3
> > as,i1,s0    seq  rd   8  74.1  74.6  76.7  76.4  74.6  76.7  75.5  77.4
> > 
> > as,i1,s0    seq  wr   2 350.3 350.9
> > as,i1,s0    seq  wr   4 160.3 161.7 153.1 153.2
> > as,i1,s0    seq  wr   8  79.5  80.9  78.2  78.7  79.7  78.3  77.8  76.7
> > 
> > as,i1,s0    seq  rdwr 2 346.8 347.0
> > as,i1,s0    seq  rdwr 4 163.3 163.5 156.7 155.8
> > as,i1,s0    seq  rdwr 8  79.1  79.4  80.1  80.3  79.1  78.9  79.6  77.8
> > 
> > ----------- ---- ---- - ----- ----- ----- ----- ----- ----- ----- -----
> > Test        Mode RdWr N test0 test1 test2 test3 test4 test5 test6 test7
> > ----------- ---- ---- - ----- ----- ----- ----- ----- ----- ----- -----
> > sy,base     rnd  rd   2  10.2  10.2
> > sy,base     rnd  rd   4   7.2   7.2   7.1   7.0
> > sy,base     rnd  rd   8   4.1   4.1   4.5   4.5   4.3   4.3   4.4   4.1
> > 
> > sy,base     rnd  wr   2  36.1  35.7
> > sy,base     rnd  wr   4  16.7  16.5  15.6  15.3
> > sy,base     rnd  wr   8   5.7   5.4   9.0   8.6   6.6   6.5   6.8   6.0
> > 
> > sy,base     rnd  rdwr 2  15.5  15.5
> > sy,base     rnd  rdwr 4   9.9   9.8   9.7   9.6
> > sy,base     rnd  rdwr 8   4.8   4.9   5.8   5.8   5.4   5.4   5.4   4.9
> > 
> > sy,base     seq  rd   2 254.7 254.8
> > sy,base     seq  rd   4 124.2 123.6 121.8 123.4
> > sy,base     seq  rd   8  56.9  56.5  56.1  56.8  56.6  56.7  56.5  56.9
> > 
> > sy,base     seq  wr   2 343.1 342.8
> > sy,base     seq  wr   4 177.4 177.9 173.1 174.7
> > sy,base     seq  wr   8  86.2  87.5  87.6  89.5  86.8  89.6  88.0  88.7
> > 
> > sy,base     seq  rdwr 2 254.0 254.4
> > sy,base     seq  rdwr 4 124.2 124.5 118.0 117.8
> > sy,base     seq  rdwr 8  57.2  56.8  57.0  58.8  56.8  56.3  57.5  57.8
> > 
> > ----------- ---- ---- - ----- ----- ----- ----- ----- ----- ----- -----
> > Test        Mode RdWr N test0 test1 test2 test3 test4 test5 test6 test7
> > ----------- ---- ---- - ----- ----- ----- ----- ----- ----- ----- -----
> > sy,i1,s8    rnd  rd   2  10.2  10.2
> > sy,i1,s8    rnd  rd   4   7.2   7.2   7.1   7.1
> > sy,i1,s8    rnd  rd   8   4.1   4.1   4.5   4.5   4.4   4.4   4.4   4.2
> > 
> 
> This is consitent. All random/sync-idle IO will be in root group with
> group_isolation=0 and we will not see service differentiation between
> groups.
>  
> > sy,i1,s8    rnd  wr   2  36.2  35.5
> > sy,i1,s8    rnd  wr   4  16.9  17.0  15.3  15.0
> > sy,i1,s8    rnd  wr   8   5.7   5.6   8.5   8.7   6.7   6.5   6.6   6.3
> > 
> 
> On my system I was seeing service differentiation for random writes also.
> The kind of pattern fio was generating, for most part of the run, CFQ
> categorized these as sync-idle workload hence these got fairness even with
> group_isolation=0.
> 
> If you run the same test with group_isolation=1, you should see better
> numbers for this case.

I'll work on updating my script to work w/ the new FIO bits (that have
cgroup included).

> 
> > sy,i1,s8    rnd  rdwr 2  15.5  15.5
> > sy,i1,s8    rnd  rdwr 4   9.8   9.8   9.7   9.6
> > sy,i1,s8    rnd  rdwr 8   4.9   4.9   5.9   5.8   5.4   5.4   5.4   5.0
> > 
> > sy,i1,s8    seq  rd   2 165.9 362.3
> > sy,i1,s8    seq  rd   4  54.0  97.2 145.5 193.9
> > sy,i1,s8    seq  rd   8  14.9  31.4  41.8  52.8  62.8  73.2  85.9  98.8
> > 
> > sy,i1,s8    seq  wr   2 220.7 441.1
> > sy,i1,s8    seq  wr   4  77.6 141.9 208.6 274.3
> > sy,i1,s8    seq  wr   8  24.9  47.3  63.8  79.1  97.8 114.8 132.1 148.6
> > 
> 
> Above seq rd and seq wr look very good. BW seems to be in proportiona to
> weight.
> 
> > sy,i1,s8    seq  rdwr 2 167.7 336.4
> > sy,i1,s8    seq  rdwr 4  54.5  98.2 141.1 187.2
> > sy,i1,s8    seq  rdwr 8  16.7  31.8  41.4  52.3  63.1  73.9  84.6  96.7
> > 
> 
> with slice_idle=0 generally you will not get any service differentiation
> until and unless group is continously backlogged. So if you launch
> multiple processes in the group, then you should see service
> differentiation even with slice_idle=0.
> 
> > ----------- ---- ---- - ----- ----- ----- ----- ----- ----- ----- -----
> > Test        Mode RdWr N test0 test1 test2 test3 test4 test5 test6 test7
> > ----------- ---- ---- - ----- ----- ----- ----- ----- ----- ----- -----
> > sy,i1,s0    rnd  rd   2  10.2  10.2
> > sy,i1,s0    rnd  rd   4   7.2   7.2   7.1   7.1
> > sy,i1,s0    rnd  rd   8   4.1   4.1   4.6   4.6   4.4   4.4   4.4   4.2
> > 
> > sy,i1,s0    rnd  wr   2  36.3  35.6
> > sy,i1,s0    rnd  wr   4  16.9  17.0  15.3  15.2
> > sy,i1,s0    rnd  wr   8   6.0   6.0   8.9   8.8   6.5   6.2   6.5   5.9
> > 
> > sy,i1,s0    rnd  rdwr 2  15.6  15.6
> > sy,i1,s0    rnd  rdwr 4  10.0  10.0   9.8   9.8
> > sy,i1,s0    rnd  rdwr 8   5.0   5.0   6.0   6.0   5.5   5.5   5.6   5.1
> > 
> > sy,i1,s0    seq  rd   2 304.2 304.3
> > sy,i1,s0    seq  rd   4 154.2 154.2 153.4 153.7
> > sy,i1,s0    seq  rd   8  76.9  76.8  77.3  76.9  77.1  77.2  77.4  78.0
> > 
> > sy,i1,s0    seq  wr   2 156.8 157.4
> > sy,i1,s0    seq  wr   4  80.7  79.6  78.5  79.0
> > sy,i1,s0    seq  wr   8  43.2  41.7  41.7  42.6  42.1  42.6  42.8  42.7
> > 
> > sy,i1,s0    seq  rdwr 2 321.1 321.7
> > sy,i1,s0    seq  rdwr 4 174.2 174.0 172.6 173.6
> > sy,i1,s0    seq  rdwr 8  86.6  86.3  88.6  88.9  90.2  89.8  90.1  89.0
> > 
> 
> In summary, async results look little bit off and need investigation. Can
> you please send me one sample async fio script.

The fio file I included above should help, right? If not, let me know,
I'll send you all the command files...

> 
> Thanks
> Vivek



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/