[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090120232748.GA10605@Krystal>
Date: Tue, 20 Jan 2009 18:27:48 -0500
From: Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
To: Jens Axboe <jens.axboe@...cle.com>
Cc: akpm@...ux-foundation.org, Ingo Molnar <mingo@...e.hu>,
Linus Torvalds <torvalds@...ux-foundation.org>,
linux-kernel@...r.kernel.org, ltt-dev@...ts.casi.polymtl.ca
Subject: Re: [RFC PATCH] block: Fix bio merge induced high I/O latency
* Jens Axboe (jens.axboe@...cle.com) wrote:
> On Tue, Jan 20 2009, Jens Axboe wrote:
> > On Mon, Jan 19 2009, Mathieu Desnoyers wrote:
> > > * Jens Axboe (jens.axboe@...cle.com) wrote:
> > > > On Sun, Jan 18 2009, Mathieu Desnoyers wrote:
> > > > > I looked at the "ls" behavior (while doing a dd) within my LTTng trace
> > > > > to create a fio job file. The said behavior is appended below as "Part
> > > > > 1 - ls I/O behavior". Note that the original "ls" test case was done
> > > > > with the anticipatory I/O scheduler, which was active by default on my
> > > > > debian system with custom vanilla 2.6.28 kernel. Also note that I am
> > > > > running this on a raid-1, but have experienced the same problem on a
> > > > > standard partition I created on the same machine.
> > > > >
> > > > > I created the fio job file appended as "Part 2 - dd+ls fio job file". It
> > > > > consists of one dd-like job and many small jobs reading as many data as
> > > > > ls did. I used the small test script to batch run this ("Part 3 - batch
> > > > > test").
> > > > >
> > > > > The results for the ls-like jobs are interesting :
> > > > >
> > > > > I/O scheduler runt-min (msec) runt-max (msec)
> > > > > noop 41 10563
> > > > > anticipatory 63 8185
> > > > > deadline 52 33387
> > > > > cfq 43 1420
> > > >
> > >
> > > Extra note : I have a HZ=250 on my system. Changing to 100 or 1000 did
> > > not make much difference (also tried with NO_HZ enabled).
> > >
> > > > Do you have queuing enabled on your drives? You can check that in
> > > > /sys/block/sdX/device/queue_depth. Try setting those to 1 and retest all
> > > > schedulers, would be good for comparison.
> > > >
> > >
> > > Here are the tests with a queue_depth of 1 :
> > >
> > > I/O scheduler runt-min (msec) runt-max (msec)
> > > noop 43 38235
> > > anticipatory 44 8728
> > > deadline 51 19751
> > > cfq 48 427
> > >
> > >
> > > Overall, I wouldn't say it makes much difference.
> >
> > 0,5 seconds vs 1,5 seconds isn't much of a difference?
> >
> > > > raid personalities or dm complicates matters, since it introduces a
> > > > disconnect between 'ls' and the io scheduler at the bottom...
> > > >
> > >
> > > Yes, ideally I should re-run those directly on the disk partitions.
> >
> > At least for comparison.
> >
> > > I am also tempted to create a fio job file which acts like a ssh server
> > > receiving a connexion after it has been pruned from the cache while the
> > > system if doing heavy I/O. "ssh", in this case, seems to be doing much
> > > more I/O than a simple "ls", and I think we might want to see if cfq
> > > behaves correctly in such case. Most of this I/O is coming from page
> > > faults (identified as traps in the trace) probably because the ssh
> > > executable has been thrown out of the cache by
> > >
> > > echo 3 > /proc/sys/vm/drop_caches
> > >
> > > The behavior of an incoming ssh connexion after clearing the cache is
> > > appended below (Part 1 - LTTng trace for incoming ssh connexion). The
> > > job file created (Part 2) reads, for each job, a 2MB file with random
> > > reads each between 4k-44k. The results are very interesting for cfq :
> > >
> > > I/O scheduler runt-min (msec) runt-max (msec)
> > > noop 586 110242
> > > anticipatory 531 26942
> > > deadline 561 108772
> > > cfq 523 28216
> > >
> > > So, basically, ssh being out of the cache can take 28s to answer an
> > > incoming ssh connexion even with the cfq scheduler. This is not exactly
> > > what I would call an acceptable latency.
> >
> > At some point, you have to stop and consider what is acceptable
> > performance for a given IO pattern. Your ssh test case is purely random
> > IO, and neither CFQ nor AS would do any idling for that. We can make
> > this test case faster for sure, the hard part is making sure that we
> > don't regress on async throughput at the same time.
> >
> > Also remember that with your raid1, it's not entirely reasonable to
> > blaim all performance issues on the IO scheduler as per my previous
> > mail. It would be a lot more fair to view the disk numbers individually.
> >
> > Can you retry this job with 'quantum' set to 1 and 'slice_async_rq' set
> > to 1 as well?
> >
> > However, I think we should be doing somewhat better at this test case.
>
> Mathieu, does this improve anything for you?
>
So, I ran the tests with my corrected patch, and the results are very
good !
"incoming ssh connexion" test
"config 2.6.28 cfq"
Linux 2.6.28
/sys/block/sd{a,b}/device/queue_depth = 31 (default)
/sys/block/sd{a,b}/queue/iosched/slice_async_rq = 2 (default)
/sys/block/sd{a,b}/queue/iosched/quantum = 4 (default)
"config 2.6.28.1-patch1"
Linux 2.6.28.1
Corrected cfq patch applied
echo 1 > /sys/block/sd{a,b}/device/queue_depth
echo 1 > /sys/block/sd{a,b}/queue/iosched/slice_async_rq
echo 1 > /sys/block/sd{a,b}/queue/iosched/quantum
On /dev/sda :
I/O scheduler runt-min (msec) runt-max (msec)
cfq (2.6.28 cfq) 523 6637
cfq (2.6.28.1-patch1) 579 2082
On raid1 :
I/O scheduler runt-min (msec) runt-max (msec)
cfq (2.6.28 cfq) 523 28216
cfq (2.6.28.1-patch1) 517 3086
It looks like we are getting somewhere :) Are there any specific
queue_depth, slice_async_rq, quantum variations you would like to be
tested ?
For reference, I attach my ssh-like job file (again) to this mail.
Mathieu
[job1]
rw=write
size=10240m
direct=0
blocksize=1024k
[global]
rw=randread
size=2048k
filesize=30m
direct=0
bsrange=4k-44k
[file1]
startdelay=0
[file2]
startdelay=4
[file3]
startdelay=8
[file4]
startdelay=12
[file5]
startdelay=16
[file6]
startdelay=20
[file7]
startdelay=24
[file8]
startdelay=28
[file9]
startdelay=32
[file10]
startdelay=36
[file11]
startdelay=40
[file12]
startdelay=44
[file13]
startdelay=48
[file14]
startdelay=52
[file15]
startdelay=56
[file16]
startdelay=60
[file17]
startdelay=64
[file18]
startdelay=68
[file19]
startdelay=72
[file20]
startdelay=76
[file21]
startdelay=80
[file22]
startdelay=84
[file23]
startdelay=88
[file24]
startdelay=92
[file25]
startdelay=96
[file26]
startdelay=100
[file27]
startdelay=104
[file28]
startdelay=108
[file29]
startdelay=112
[file30]
startdelay=116
[file31]
startdelay=120
[file32]
startdelay=124
[file33]
startdelay=128
[file34]
startdelay=132
[file35]
startdelay=134
[file36]
startdelay=138
[file37]
startdelay=142
[file38]
startdelay=146
[file39]
startdelay=150
[file40]
startdelay=200
[file41]
startdelay=260
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists