[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <x49eiohqa06.fsf@segfault.boston.devel.redhat.com>
Date: Mon, 02 Nov 2009 09:25:29 -0500
From: Jeff Moyer <jmoyer@...hat.com>
To: Zubin Dittia <zubin@...tri.com>
Cc: linux-kernel@...r.kernel.org
Subject: Re: SSD read latency negatively impacted by large writes (independent of choice of I/O scheduler)
Zubin Dittia <zubin@...tri.com> writes:
> I've been doing some testing with an Intel X25-E SSD, and noticed that
> large writes can severely affect read latency, regardless of which I/O
> scheduler or scheduler parameters are in use (this is with kernel
> 2.6.28-16 from Ubuntu jaunty 9.04). The test was very simple: I had
> two threads running; the first was in a tight loop reading different
> 4KB sized blocks (and recording the latency of each read) from the SSD
> block device file. While the first thread is doing this, a second
> thread does a single big 5MB write to the device. What I noticed is
> that about 30 seconds after the write (which is when the write is
> actually written back to the device from buffer cache), I see a very
> large spike in read latency: from 200 microseconds to 25 milliseconds.
> This seems to imply that the writes issued by the scheduler are not
> being broken up into sufficiently small chunks with interspersed
> reads; instead, the whole sequential write seems to be getting issued
> while starving reads during that period. I've noticed the same
> behavior with SSDs from another vendor as well, and there the latency
> impact was even worse (80 ms). Playing around with different I/O
> schedulers and parameters doesn't seem to help at all.
>
> The same behavior is exhibited when using O_DIRECT as well (except
> that the latency hit is immediate instead of 30 seconds later, as one
> would expect). The only way I was able to reduce the worst-case read
> latency was by using O_DIRECT and breaking up the large write into
> multiple smaller writes (with one system call per smaller write). My
> theory is that the time between write system calls was enough to allow
> reads to squeeze themselves in between the writes. But, as would be
> expected, this does bad things to the sequential write throughput
> because of the overhead of multiple system calls.
>
> My question is: have others seen this behavior? Are there any
> tunables that could help (perhaps a parameter that would dictate the
> largest size of a write that can be pending to the device at any given
> time). If not, would it make sense to implement a new I/O scheduler
> (or hack an existing one) which does this.
I haven't verified your findings, but if what you state is true, then
you could try tuning max_sectors_kb for your device. Making that
smaller will decrease the total amount of I/O that can be queued in the
device at any given time. There's always a trade-off between bandwidth
and latency, of course.
Cheers,
Jeff
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists