[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1164136669.4265.65.camel@sale659.sandia.gov>
Date: Tue, 21 Nov 2006 12:17:49 -0700
From: "Jim Schutt" <jaschut@...dia.gov>
To: "Jens Axboe" <jens.axboe@...cle.com>
cc: linux-kernel@...r.kernel.org
Subject: Re: splice/vmsplice performance test results
On Tue, 2006-11-21 at 14:54 +0100, Jens Axboe wrote:
> On Mon, Nov 20 2006, Jim Schutt wrote:
> > On Mon, 2006-11-20 at 09:24 +0100, Jens Axboe wrote:
> > > On Mon, Nov 20 2006, Jens Axboe wrote:
> > > > On Fri, Nov 17 2006, Jim Schutt wrote:
> > > > > On Thu, 2006-11-16 at 21:25 +0100, Jens Axboe wrote:
> > > > > > On Thu, Nov 16 2006, Jim Schutt wrote:
> > > > > > > Hi,
> > > > > > >
> > > > >
> > > > > > > My test program can do one of the following:
> > > > > > >
> > > > > > > send data:
> > > > > > > A) read() from file into buffer, write() buffer into socket
> > > > > > > B) mmap() section of file, write() that into socket, munmap()
> > > > > > > C) splice() from file to pipe, splice() from pipe to socket
> > > > > > >
> > > > > > > receive data:
> > > > > > > 1) read() from socket into buffer, write() buffer into file
> > > > > > > 2) ftruncate() to extend file, mmap() new extent, read()
> > > > > > > from socket into new extent, munmap()
> > > > > > > 3) read() from socket into buffer, vmsplice() buffer to
> > > > > > > pipe, splice() pipe to file (using the double-buffer trick)
> > > > > > >
> > > > > > > Here's the results, using:
> > > > > > > - 64 KiB buffer, mmap extent, or splice
> > > > > > > - 1 MiB TCP window
> > > > > > > - 16 GiB data sent across network
> > > > > > >
> > > > > > > A) from /dev/zero -> 1) to /dev/null : 857 MB/s (6.86 Gb/s)
> > > > > > >
> > > > > > > A) from file -> 1) to /dev/null : 472 MB/s (3.77 Gb/s)
> > > > > > > B) from file -> 1) to /dev/null : 366 MB/s (2.93 Gb/s)
> > > > > > > C) from file -> 1) to /dev/null : 854 MB/s (6.83 Gb/s)
> > > > > > >
> > > > > > > A) from /dev/zero -> 1) to file : 375 MB/s (3.00 Gb/s)
> > > > > > > A) from /dev/zero -> 2) to file : 150 MB/s (1.20 Gb/s)
> > > > > > > A) from /dev/zero -> 3) to file : 286 MB/s (2.29 Gb/s)
> > > > > > >
> > > > > > > I had (naively) hoped the read/vmsplice/splice combination would
> > > > > > > run at the same speed I can write a file, i.e. at about 450 MB/s
> > > > > > > on my setup. Do any of my numbers seem bogus, so I should look
> > > > > > > harder at my test program?
> > > > > >
> > > > > > Could be read-ahead playing in here, I'd have to take a closer look at
> > > > > > the generated io patterns to say more about that. Any chance you can
> > > > > > capture iostat or blktrace info for such a run to compare that goes to
> > > > > > the disk?
> > > > >
> > > > > I've attached a file with iostat and vmstat results for the case
> > > > > where I read from a socket and write a file, vs. the case where I
> > > > > read from a socket and use vmsplice/splice to write the file.
> > > > > (Sorry it's not inline - my mailer locks up when I try to
> > > > > include the file.)
> > > > >
> > > > > Would you still like blktrace info for these two cases?
> > > >
> > > > No, I think the iostat data is fine, I don't think the blktrace info
> > > > would give me any more insight on this problem. I'll set up a test to
> > > > reproduce it here, looks like the write out path could be optimized some
> > > > more.
> >
> > Great, let me know if you need testing from me.
>
> I found some suboptimal behaviour in your test app - you don't check for
> short reads and splice would really like things to be aligned for the
> best performance. I did some testing with the original app here, and I
> get 114.769MB/s for read-from-socket -> write-to-file and 109.878MB/s
> for read-from-socket -> vmsplice-splice-to-file. If I fix up the read to
> always get the full buffer size before doing the vmsplice+splice, the
> performance is up to the same as the read/write.
Sorry - I had assumed my network was so much faster than my
disk subsystem I'd never get a short read from a socket except at
the end of the transfer. Pretty silly of me, in hindsight.
I can see now how even one short read early would screw up
the alignment for splicing into a file for the rest of the
transfer, right?
Here's some new results:
Run w/check for short read on socket in vmsplice case:
- /dev/zero -> /dev/null w/ socket read + file write: 1130 MB/s
(Man, my network is running fast today. I don't know why.)
- /dev/zero -> /dev/null w/ socket read + vmsplice/splice: 1028 MB/s
- /dev/zero -> file w/ socket read + vmsplice/splice: 336 MB/s
Rerun w/original:
- /dev/zero -> /dev/null w/ socket read + vmsplice/splice: 1026 MB/s
- /dev/zero -> file w/ socket read + vmsplice/splice: 285 MB/s
- /dev/zero -> file w/ socket read + file write: 382 MB/s
So I was losing 50 MB/s due to short reads on the socket
screwing up the alignment for splice. Sorry to waste your
time on that.
But, it looks like socket-read + file-write is still ~50 MB/s
faster than socket-read + vmsplice/splice (assuming I didn't
screw up my short read fix - see patch below). I assume that's
still unexpected?
>
> Since it's doing buffered writes, the results do vary a lot though (as
> you also indicated). A raw /dev/zero -> /dev/null is 3 times faster with
> vmsplice/splice.
>
Hmmm. Is it worth me trying to do some sort of kernel
profiling to see if there is anything unexpected with
my setup? If so, do you have a preference as to what
I would use?
Here's how I fixed my app to fix up (I think) short reads.
Maybe I missed your point?
diff --git a/src/dnd.c b/src/dnd.c
index 01bd7b8..aa70102 100644
--- a/src/dnd.c
+++ b/src/dnd.c
@@ -773,18 +773,26 @@ uint64_t vmsplice_recv(const struct opti
again:
i = (i + 1) & 1;
iov.iov_base = opts->buf + i * opts->buf_size;
+ l = 0;
again2:
- l = read(sd, iov.iov_base, opts->buf_size);
- if (l < 0) {
+ m = read(sd, iov.iov_base + l, opts->buf_size - l);
+ if (m < 0) {
if (errno == EINTR)
goto again2;
perror("Read");
exit(EXIT_FAILURE);
}
- if (l == 0) {
- fdatasync(fd);
- return bytes;
+ if (m == 0) {
+ if (l == 0) {
+ fdatasync(fd);
+ return bytes;
+ }
+ }
+ else {
+ l += m;
+ if (l != opts->buf_size)
+ goto again2;
}
while (l) {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists