[<prev] [next>] [day] [month] [year] [list]
Message-ID: <d1c2719f0807301531l212e85b7wbe3c2db486926438@mail.gmail.com>
Date: Wed, 30 Jul 2008 15:31:22 -0700
From: "Jerry Chu" <hkchu@...gle.com>
To: herbert@...dor.apana.org.au
Cc: davem@...emloft.net, netdev@...r.kernel.org
Subject: [TCP bug] TSO trips over Nagle! (was Re: [TCP]: Let skbs grow over a page on fast peers)
Your change also happens to fix a nasty bug I've been looking at in
the past week when tcp_push_one() is often called with
(skb->len % mss) != 0, leaving a non-mss-multiple skb on the send list.
Occasionally __tcp_push_pending_frames() will be called with such
skb, causing a < mss segment to be generated/sent and Nagle to be
on (tcp_write_xmit()->tcp_minshall_update()).
The problem is that the original user send size is most likely not a
multiple of mss. This will cause the last fragment of a request and/or
response to be stuck until the previous Nagle condition is cleared (i.e.,
the fragment generated by the above bug is acked). This causes many
request/response performance over WAN to be only half of what it
should be because of the additional round-trip time required.
Again thanks for the changes!
Jerry
> From: Herbert Xu <herbert@...dor.apana.org.au>
> Date: Sat, Mar 22, 2008 at 5:59 AM
> Subject: [TCP]: Let skbs grow over a page on fast peers
> To: "David S. Miller" <davem@...emloft.net>, netdev@...r.kernel.org
>
>
> Hi Dave:
>
> [TCP]: Let skbs grow over a page on fast peers
>
> While testing the virtio-net driver on KVM with TSO I noticed
> that TSO performance with a 1500 MTU is significantly worse
> compared to the performance of non-TSO with a 16436 MTU. The
> packet dump shows that most of the packets sent are smaller
> than a page.
>
> Looking at the code this actually is quite obvious as it always
> stop extending the packet if it's the first packet yet to be
> sent and if it's larger than the MSS. Since each extension is
> bound by the page size, this means that (given a 1500 MTU) we're
> very unlikely to construct packets greater than a page, provided
> that the receiver and the path is fast enough so that packets can
> always be sent immediately.
>
> The fix is also quite obvious. The push calls inside the loop
> is just an optimisation so that we don't end up doing all the
> sending at the end of the loop. Therefore there is no specific
> reason why it has to do so at MSS boundaries. For TSO, the
> most natural extension of this optimisation is to do the pushing
> once the skb exceeds the TSO size goal.
>
> This is what the patch does and testing with KVM shows that the
> TSO performance with a 1500 MTU easily surpasses that of a 16436
> MTU and indeed the packet sizes sent are generally larger than
> 16436.
>
> I don't see any obvious downsides for slower peers or connections,
> but it would be prudent to test this extensively to ensure that
> those cases don't regress.
>
> Signed-off-by: Herbert Xu <herbert@...dor.apana.org.au>
>
> Cheers,
> --
> Visit Openswan at http://www.openswan.org/
> Email: Herbert Xu ~{PmV>HI~} <herbert@...dor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
> --
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 071e83a..39b629a 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -735,7 +735,7 @@ new_segment:
> if (!(psize -= copy))
> goto out;
>
> - if (skb->len < mss_now || (flags & MSG_OOB))
> + if (skb->len < size_goal || (flags & MSG_OOB))
> continue;
>
> if (forced_push(tp)) {
> @@ -981,7 +981,7 @@ new_segment:
> if ((seglen -= copy) == 0 && iovlen == 0)
> goto out;
>
> - if (skb->len < mss_now || (flags & MSG_OOB))
> + if (skb->len < size_goal || (flags & MSG_OOB))
> continue;
>
> if (forced_push(tp)) {
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists