[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4967B8C5.10803@cosmosbay.com>
Date: Fri, 09 Jan 2009 21:51:17 +0100
From: Eric Dumazet <dada1@...mosbay.com>
To: Willy Tarreau <w@....eu>
CC: David Miller <davem@...emloft.net>, ben@...s.com,
jarkao2@...il.com, mingo@...e.hu, linux-kernel@...r.kernel.org,
netdev@...r.kernel.org, jens.axboe@...cle.com
Subject: Re: [PATCH] tcp: splice as many packets as possible at once
Willy Tarreau a écrit :
> Hi Eric,
>
> On Fri, Jan 09, 2009 at 04:42:44PM +0100, Eric Dumazet wrote:
> (...)
>> Willy patch makes splice() behaving like tcp_recvmsg(), but we might call
>> tcp_cleanup_rbuf() several times, with copied=1460 (for each frame processed)
>>
>> I wonder if the right fix should be done in tcp_read_sock() : this is the
>> one who should eat several skbs IMHO, if we want optimal ACK generation.
>>
>> We break out of its loop at line 1246
>>
>> if (!desc->count) /* this test is always true */
>> break;
>>
>> (__tcp_splice_read() set count to 0, right before calling tcp_read_sock())
>>
>> So code at line 1246 (tcp_read_sock()) seems wrong, or pessimistic at least.
>
> That's a very interesting discovery that you made here. I have made
> mesurements with this line commented out just to get an idea. The
> hardest part was to find a CPU-bound machine. Finally I slowed my
> laptop down to 300 MHz (in fact, 600 with throttle 50%, but let's
> call that 300). That way, I cannot saturate the PCI-based tg3 and
> I can observe the effects of various changes on the data rate.
>
> - original tcp_splice_read(), with "!timeo" : 24.1 MB/s
> - modified tcp_splice_read(), without "!timeo" : 32.5 MB/s (+34%)
> - original with line #1246 commented out : 34.5 MB/s (+43%)
>
> So you're right, avoiding calling tcp_read_sock() all the time
> gives a nice performance boost.
>
> Also, I found that tcp_splice_read() behaves like this when breaking
> out of the loop :
>
> lock_sock();
> while () {
> ...
> __tcp_splice_read();
> ...
> release_sock();
> lock_sock();
> if (break condition)
> break;
> }
> release_sock();
>
> Which means that when breaking out of the loop on (!timeo)
> with ret > 0, we do release_sock/lock_sock/release_sock.
>
> So I tried a minor modification, consisting in moving the
> test before release_sock(), and leaving !timeo there with
> line #1246 commented out. That's a noticeable winner, as
> the data rate went up to 35.7 MB/s (+48%).
>
> Also, in your second mail, you're saying that your change
> might return more data than requested by the user. I can't
> find why, could you please explain to me, as I'm still quite
> ignorant in this area ?
Well, I just tested various user programs and indeed got this
strange result :
Here I call splice() with len=1000 (0x3e8), and you can see
it gives a result of 1460 at the second call.
I suspect a bug in splice code, that my patch just exposed.
pipe([3, 4]) = 0
open("/dev/null", O_RDWR|O_CREAT|O_TRUNC, 0644) = 5
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 6
bind(6, {sa_family=AF_INET, sin_port=htons(4711), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
listen(6, 5) = 0
accept(6, 0, NULL) = 7
setsockopt(7, SOL_SOCKET, SO_RCVLOWAT, [1000], 4) = 0
poll([{fd=7, events=POLLIN, revents=POLLIN}], 1, -1) = 1
splice(0x7, 0, 0x4, 0, 0x3e8, 0x3) = 1000 OK
splice(0x3, 0, 0x5, 0, 0x3e8, 0x5) = 1000
poll([{fd=7, events=POLLIN, revents=POLLIN}], 1, -1) = 1
splice(0x7, 0, 0x4, 0, 0x3e8, 0x3) = 1460 Oh well... 1460 > 1000 !!!
splice(0x3, 0, 0x5, 0, 0x5b4, 0x5) = 1460
poll([{fd=7, events=POLLIN, revents=POLLIN}], 1, -1) = 1
splice(0x7, 0, 0x4, 0, 0x3e8, 0x3) = 1460
splice(0x3, 0, 0x5, 0, 0x5b4, 0x5) = 1460
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists