linux-kernel - Re: [RFC] [PATCH v2 0/8] Provide cgroup isolation for buffered writes.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110323012755.GA10325@redhat.com>
Date:	Tue, 22 Mar 2011 21:27:55 -0400
From:	Vivek Goyal <vgoyal@...hat.com>
To:	Justin TerAvest <teravest@...gle.com>
Cc:	jaxboe@...ionio.com, m-ikeda@...jp.nec.com, ryov@...inux.co.jp,
	taka@...inux.co.jp, kamezawa.hiroyu@...fujitsu.com,
	righi.andrea@...il.com, guijianfeng@...fujitsu.com,
	balbir@...ux.vnet.ibm.com, ctalbott@...gle.com,
	linux-kernel@...r.kernel.org
Subject: Re: [RFC] [PATCH v2 0/8] Provide cgroup isolation for buffered
 writes.

On Tue, Mar 22, 2011 at 04:08:47PM -0700, Justin TerAvest wrote:

[..]
> ===================================== Isolation experiment results
> 
> For isolation testing, we run a test that's available at:
>   git://google3-2.osuosl.org/tests/blkcgroup.git
> 
> It creates containers, runs workloads, and checks to see how well we meet
> isolation targets. For the purposes of this patchset, I only ran
> tests among buffered writers.
> 
> Before patches
> ==============
> 10:32:06 INFO experiment 0 achieved DTFs: 666, 333
> 10:32:06 INFO experiment 0 FAILED: max observed error is 167, allowed is 150
> 10:32:51 INFO experiment 1 achieved DTFs: 647, 352
> 10:32:51 INFO experiment 1 FAILED: max observed error is 253, allowed is 150
> 10:33:35 INFO experiment 2 achieved DTFs: 298, 701
> 10:33:35 INFO experiment 2 FAILED: max observed error is 199, allowed is 150
> 10:34:19 INFO experiment 3 achieved DTFs: 445, 277, 277
> 10:34:19 INFO experiment 3 FAILED: max observed error is 155, allowed is 150
> 10:35:05 INFO experiment 4 achieved DTFs: 418, 104, 261, 215
> 10:35:05 INFO experiment 4 FAILED: max observed error is 232, allowed is 150
> 10:35:53 INFO experiment 5 achieved DTFs: 213, 136, 68, 102, 170, 136, 170
> 10:35:53 INFO experiment 5 PASSED: max observed error is 73, allowed is 150
> 10:36:04 INFO -----ran 6 experiments, 1 passed, 5 failed
> 
> After patches
> =============
> 11:05:22 INFO experiment 0 achieved DTFs: 501, 498
> 11:05:22 INFO experiment 0 PASSED: max observed error is 2, allowed is 150
> 11:06:07 INFO experiment 1 achieved DTFs: 874, 125
> 11:06:07 INFO experiment 1 PASSED: max observed error is 26, allowed is 150
> 11:06:53 INFO experiment 2 achieved DTFs: 121, 878
> 11:06:53 INFO experiment 2 PASSED: max observed error is 22, allowed is 150
> 11:07:46 INFO experiment 3 achieved DTFs: 589, 205, 204
> 11:07:46 INFO experiment 3 PASSED: max observed error is 11, allowed is 150
> 11:08:34 INFO experiment 4 achieved DTFs: 616, 109, 109, 163
> 11:08:34 INFO experiment 4 PASSED: max observed error is 34, allowed is 150
> 11:09:29 INFO experiment 5 achieved DTFs: 139, 139, 139, 139, 140, 141, 160
> 11:09:29 INFO experiment 5 PASSED: max observed error is 1, allowed is 150
> 11:09:46 INFO -----ran 6 experiments, 6 passed, 0 failed
> 
> Summary
> =======
> Isolation between buffered writers is clearly better with this patch.

Can you pleae explain what is this test doing. All I am seeing is passed
and failed and really don't understand what the test is doing.

Can you run say simple 4 dd buffered writers in 4 cgroups with weights
100, 200, 300 and 400 and see if you get better isolation.

Secondly can you also please explain that how does it work. Without
making writeback cgroup aware, there are no gurantees that higher
weight cgroup will get more IO done. 

> 
> 
> =============================== Read latency results
> To test read latency, I created two containers:
>   - One called "readers", with weight 900
>   - One called "writers", with weight 100
> 
> I ran this fio workload in "readers":
> [global]
> directory=/mnt/iostestmnt/fio
> runtime=30
> time_based=1
> group_reporting=1
> exec_prerun='echo 3 > /proc/sys/vm/drop_caches'
> cgroup_nodelete=1
> bs=4K
> size=512M
> 
> [iostest-read]
> description="reader"
> numjobs=16
> rw=randread
> new_group=1
> 
> 
> ....and this fio workload in "writers"
> [global]
> directory=/mnt/iostestmnt/fio
> runtime=30
> time_based=1
> group_reporting=1
> exec_prerun='echo 3 > /proc/sys/vm/drop_caches'
> cgroup_nodelete=1
> bs=4K
> size=512M
> 
> [iostest-write]
> description="writer"
> cgroup=writers
> numjobs=3
> rw=write
> new_group=1
> 
> 
> 
> I've pasted the results from the "read" workload inline.
> 
> Before patches
> ==============
> Starting 16 processes
> 
> Jobs: 14 (f=14): [_rrrrrr_rrrrrrrr] [36.2% done] [352K/0K /s] [86 /0  iops] [eta 01m:00s]·············
> iostest-read: (groupid=0, jobs=16): err= 0: pid=20606
>   Description  : ["reader"]
>   read : io=13532KB, bw=455814 B/s, iops=111 , runt= 30400msec
>     clat (usec): min=2190 , max=30399K, avg=30395175.13, stdev= 0.20
>      lat (usec): min=2190 , max=30399K, avg=30395177.07, stdev= 0.20
>     bw (KB/s) : min=    0, max=  260, per=0.00%, avg= 0.00, stdev= 0.00
>   cpu          : usr=0.00%, sys=0.03%, ctx=3691, majf=2, minf=468
>   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      issued r/w/d: total=3383/0/0, short=0/0/0
> 
>      lat (msec): 4=0.03%, 10=2.66%, 20=74.84%, 50=21.90%, 100=0.09%
>      lat (msec): 250=0.06%, >=2000=0.41%
> 
> Run status group 0 (all jobs):
>    READ: io=13532KB, aggrb=445KB/s, minb=455KB/s, maxb=455KB/s, mint=30400msec, maxt=30400msec
> 
> Disk stats (read/write):
>   sdb: ios=3744/18, merge=0/16, ticks=542713/1675, in_queue=550714, util=99.15%
> 
> 
> 
> After patches
> =============
> tarting 16 processes
> Jobs: 16 (f=16): [rrrrrrrrrrrrrrrr] [100.0% done] [557K/0K /s] [136 /0  iops] [eta 00m:00s]
> iostest-read: (groupid=0, jobs=16): err= 0: pid=14183
>   Description  : ["reader"]
>   read : io=14940KB, bw=506105 B/s, iops=123 , runt= 30228msec
>     clat (msec): min=2 , max=29866 , avg=463.42, stdev=101.84
>      lat (msec): min=2 , max=29866 , avg=463.42, stdev=101.84
>     bw (KB/s) : min=    0, max=  198, per=31.69%, avg=156.52, stdev=17.83
>   cpu          : usr=0.01%, sys=0.03%, ctx=4274, majf=2, minf=464
>   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
>      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      issued r/w/d: total=3735/0/0, short=0/0/0
> 
>      lat (msec): 4=0.05%, 10=0.32%, 20=32.99%, 50=64.61%, 100=1.26%
>      lat (msec): 250=0.11%, 500=0.11%, 750=0.16%, 1000=0.05%, >=2000=0.35%
> 
> Run status group 0 (all jobs):
>    READ: io=14940KB, aggrb=494KB/s, minb=506KB/s, maxb=506KB/s, mint=30228msec, maxt=30228msec
> 
> Disk stats (read/write):
>   sdb: ios=4189/0, merge=0/0, ticks=96428/0, in_queue=478798, util=100.00%
> 
> 
> 
> Summary
> =======
> Read latencies are a bit worse, but this overhead is only imposed when users
> ask for this feature by turning on CONFIG_BLKIOTRACK. We expect there to be =
> a something of a latency vs isolation tradeoff.

- What number you are looking at to say READ latencies are worse.
- Who got isolated here? If READS latencies are worse and you are saying
  that's the cost of isolation, that means you are looking for isolation
  for WRITES? This is the first time time I am hearing that READS starved
  WRITES and I want better isolation for WRITES.

Also CONFIG_BLKIOTRACK=n is not the solution. This will most likely be
set and we need to figure out which makes sense.

To me WRITE isolation comes handy only if we want to create speed
difference between multiple WRITE streams. And that can not reliably be
done till we make writeback logic cgroup aware.

If we try to put WRITES in a separate group, most likely WRITES will end
up getting bigger share of disk then what they are getting by default and
I seriously doubt that who is looking for that. So far all the complaints
I have heard is that in presence of WRITES, my READ latencies suffer and
not vice a versa.

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/