[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <1333481821.18626.322.camel@edumazet-glaptop>
Date: Tue, 03 Apr 2012 21:37:01 +0200
From: Eric Dumazet <eric.dumazet@...il.com>
To: David Miller <davem@...emloft.net>
Cc: netdev <netdev@...r.kernel.org>,
Neal Cardwell <ncardwell@...gle.com>,
Tom Herbert <therbert@...gle.com>,
Yuchung Cheng <ycheng@...gle.com>,
"H.K. Jerry Chu" <hkchu@...gle.com>,
Maciej Żenczykowski <maze@...gle.com>,
Mahesh Bandewar <maheshb@...gle.com>,
Ilpo Järvinen <ilpo.jarvinen@...sinki.fi>,
Nandita Dukkipati <nanditad@...gle.com>
Subject: [PATCH] tcp: allow splice() to build full TSO packets
vmsplice()/splice(pipe, socket) call do_tcp_sendpages() one page at a
time, adding at most 4096 bytes to an skb. (assuming PAGE_SIZE=4096)
The call to tcp_push() at the end of do_tcp_sendpages() forces an
immediate xmit when pipe is not already filled, and tso_fragment() try
to split these skb to MSS multiples.
4096 bytes are usually split in a skb with 2 MSS, and a remaining
sub-mss skb (assuming MTU=1500)
This makes slow start suboptimal because many small frames are sent to
qdisc/driver layers instead of big ones (constrained by cwnd and packets
in flight of course)
In fact, applications using sendmsg() (adding an additional memory copy)
instead of vmsplice()/splice()/sendfile() are a bit faster because of
this anomaly, especially if serving small files in environments with
large initial [c]wnd.
Call tcp_push() only if MSG_MORE is not set in the flags parameter.
This bit is automatically provided by splice() internals but for the
last page, or on all pages if user specified SPLICE_F_MORE splice()
flag.
In some workloads, this can reduce number of sent logical packets by an
order of magnitude, making zero-copy TCP actually faster than
one-copy :)
Reported-by: Tom Herbert <therbert@...gle.com>
Cc: Nandita Dukkipati <nanditad@...gle.com>
Cc: Neal Cardwell <ncardwell@...gle.com>
Cc: Tom Herbert <therbert@...gle.com>
Cc: Yuchung Cheng <ycheng@...gle.com>
Cc: H.K. Jerry Chu <hkchu@...gle.com>
Cc: Maciej Żenczykowski <maze@...gle.com>
Cc: Mahesh Bandewar <maheshb@...gle.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@...sinki.fi>
Signed-off-by: Eric Dumazet <eric.dumazet@...il>com>
---
net/ipv4/tcp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index cfd7edd..2ff6f45 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -860,7 +860,7 @@ wait_for_memory:
}
out:
- if (copied)
+ if (copied && !(flags & MSG_MORE))
tcp_push(sk, flags, mss_now, tp->nonagle);
return copied;
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists