linux-kernel - Re: [RFC] writeback and cgroup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120425120502.GA18819@localhost>
Date:	Wed, 25 Apr 2012 20:05:02 +0800
From:	Fengguang Wu <fengguang.wu@...el.com>
To:	Jan Kara <jack@...e.cz>
Cc:	Vivek Goyal <vgoyal@...hat.com>, Tejun Heo <tj@...nel.org>,
	Jens Axboe <axboe@...nel.dk>, linux-mm@...ck.org,
	sjayaraman@...e.com, andrea@...terlinux.com, jmoyer@...hat.com,
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
	kamezawa.hiroyu@...fujitsu.com, lizefan@...wei.com,
	containers@...ts.linux-foundation.org, cgroups@...r.kernel.org,
	ctalbott@...gle.com, rni@...gle.com, lsf@...ts.linux-foundation.org
Subject: Re: [RFC] writeback and cgroup

> > So the cfq behavior is pretty undetermined. I more or less realize
> > this from the experiments. For example, when starting 2+ "dd oflag=direct"
> > tasks in one single cgroup, they _sometimes_ progress at different rates.
> > See the attached graphs for two such examples on XFS. ext4 is fine.
> > 
> > The 2-dd test case is:
> > 
> > mkdir /cgroup/dd
> > echo $$ > /cgroup/dd/tasks
> > 
> > dd if=/dev/zero of=/fs/zero1 bs=1M oflag=direct &
> > dd if=/dev/zero of=/fs/zero2 bs=1M oflag=direct &
> > 
> > The 6-dd test case is similar.
>   Hum, interesting. I would not expect that. Maybe it's because files are
> allocated at the different area of the disk. But even then the difference
> should not be *that* big.

Agreed.

> > > > Look at this graph, the 4 dd tasks are granted the same weight (2 of
> > > > them are buffered writes). I guess the 2 buffered dd tasks managed to
> > > > progress much faster than the 2 direct dd tasks just because the async
> > > > IOs are much more efficient than the bs=64k direct IOs.
> > >   Likely because 64k is too low to get good bandwidth with direct IO. If
> > > it was 4M, I believe you would get similar throughput for buffered and
> > > direct IO. So essentially you are right, small IO benefits from caching
> > > effects since they allow you to submit larger requests to the device which
> > > is more efficient.
> > 
> > I didn't direct compare the effects, however here is an example of
> > doing 1M, 64k, 4k direct writes in parallel. It _seems_ bs=1M only has
> > marginal benefits of 64k, assuming cfq is behaving well.
> > 
> > https://github.com/fengguang/io-controller-tests/raw/master/log/snb/ext4/direct-write-1M-64k-4k.2012-04-19-10-50/balance_dirty_pages-task-bw.png
> > 
> > The test case is:
> > 
> > # cgroup 1
> > echo 500 > /cgroup/cp/blkio.weight
> > 
> > dd if=/dev/zero of=/fs/zero-1M bs=1M oflag=direct &
> > 
> > # cgroup 2
> > echo 1000 > /cgroup/dd/blkio.weight
> > 
> > dd if=/dev/zero of=/fs/zero-64k bs=64k oflag=direct &
> > dd if=/dev/zero of=/fs/zero-4k  bs=4k  oflag=direct &
>   Um, I'm not completely sure what you tried to test in the above test.

Yeah it's not a good test case. I've changed it to run the 3 dd tasks
in 3 cgroups with equal weight. Attached the new results (looks the
same as the original one).

> What I wanted to point out is that direct IO is not necessarily less
> efficient than buffered IO. Look:
> xen-node0:~ # uname -a
> Linux xen-node0 3.3.0-rc4-xen+ #6 SMP PREEMPT Tue Apr 17 06:48:08 UTC 2012
> x86_64 x86_64 x86_64 GNU/Linux
> xen-node0:~ # dd if=/dev/zero of=/mnt/file bs=1M count=1024 conv=fsync
> 1024+0 records in
> 1024+0 records out
> 1073741824 bytes (1.1 GB) copied, 10.5304 s, 102 MB/s
> xen-node0:~ # dd if=/dev/zero of=/mnt/file bs=1M count=1024 oflag=direct conv=fsync
> 1024+0 records in
> 1024+0 records out
> 1073741824 bytes (1.1 GB) copied, 10.3678 s, 104 MB/s
> 
> So both direct and buffered IO are about the same. Note that I used
> conv=fsync flag to erase the effect that part of buffered write still
> remains in the cache when dd is done writing which is unfair to direct
> writer...

OK, I also find direct write being a bit faster than buffered write:

root@snb /home/wfg# dd if=/dev/zero of=/mnt/file bs=1M count=1024 conv=fsync

1073741824 bytes (1.1 GB) copied, 10.4039 s, 103 MB/s
1073741824 bytes (1.1 GB) copied, 10.4143 s, 103 MB/s

root@snb /home/wfg# dd if=/dev/zero of=/mnt/file bs=1M count=1024 oflag=direct conv=fsync

1073741824 bytes (1.1 GB) copied, 9.9006 s, 108 MB/s
1073741824 bytes (1.1 GB) copied, 9.55173 s, 112 MB/s

root@snb /home/wfg# dd if=/dev/zero of=/mnt/file bs=64k count=16384 oflag=direct conv=fsync

1073741824 bytes (1.1 GB) copied, 9.83902 s, 109 MB/s
1073741824 bytes (1.1 GB) copied, 9.61725 s, 112 MB/s

> And actually 64k vs 1M makes a big difference on my machine:
> xen-node0:~ # dd if=/dev/zero of=/mnt/file bs=64k count=16384 oflag=direct conv=fsync
> 16384+0 records in
> 16384+0 records out
> 1073741824 bytes (1.1 GB) copied, 19.3176 s, 55.6 MB/s

Interestingly, my 64k direct writes are as fast as 1M direct writes...
and 4k writes run at ~1/4 speed:

root@snb /home/wfg# dd if=/dev/zero of=/mnt/file bs=4k count=$((256<<10)) oflag=direct conv=fsync

1073741824 bytes (1.1 GB) copied, 42.0726 s, 25.5 MB/s

Thanks,
Fengguang

Download attachment "balance_dirty_pages-task-bw.png" of type "image/png" (61279 bytes)