lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <d1c2719f0807301531l212e85b7wbe3c2db486926438@mail.gmail.com>
Date:	Wed, 30 Jul 2008 15:31:22 -0700
From:	"Jerry Chu" <hkchu@...gle.com>
To:	herbert@...dor.apana.org.au
Cc:	davem@...emloft.net, netdev@...r.kernel.org
Subject: [TCP bug] TSO trips over Nagle! (was Re: [TCP]: Let skbs grow over a page on fast peers)

Your change also happens to fix a nasty bug I've been looking at in
the past week when tcp_push_one() is often called with
(skb->len % mss) != 0, leaving a non-mss-multiple skb on the send list.
Occasionally __tcp_push_pending_frames() will be called with such
skb, causing a < mss segment to be generated/sent and Nagle to be
on (tcp_write_xmit()->tcp_minshall_update()).

The problem is that the original user send size is most likely not a
multiple of mss. This will cause the last fragment of a request and/or
response to be stuck until the previous Nagle condition is cleared (i.e.,
the fragment generated by the above bug is acked). This causes many
request/response performance over WAN to be only half of what it
should be because of the additional round-trip time required.

Again thanks for the changes!

Jerry

> From: Herbert Xu <herbert@...dor.apana.org.au>
> Date: Sat, Mar 22, 2008 at 5:59 AM
> Subject: [TCP]: Let skbs grow over a page on fast peers
> To: "David S. Miller" <davem@...emloft.net>, netdev@...r.kernel.org
>
>
> Hi Dave:
>
> [TCP]: Let skbs grow over a page on fast peers
>
> While testing the virtio-net driver on KVM with TSO I noticed
> that TSO performance with a 1500 MTU is significantly worse
> compared to the performance of non-TSO with a 16436 MTU.  The
> packet dump shows that most of the packets sent are smaller
> than a page.
>
> Looking at the code this actually is quite obvious as it always
> stop extending the packet if it's the first packet yet to be
> sent and if it's larger than the MSS.  Since each extension is
> bound by the page size, this means that (given a 1500 MTU) we're
> very unlikely to construct packets greater than a page, provided
> that the receiver and the path is fast enough so that packets can
> always be sent immediately.
>
> The fix is also quite obvious.  The push calls inside the loop
> is just an optimisation so that we don't end up doing all the
> sending at the end of the loop.  Therefore there is no specific
> reason why it has to do so at MSS boundaries.  For TSO, the
> most natural extension of this optimisation is to do the pushing
> once the skb exceeds the TSO size goal.
>
> This is what the patch does and testing with KVM shows that the
> TSO performance with a 1500 MTU easily surpasses that of a 16436
> MTU and indeed the packet sizes sent are generally larger than
> 16436.
>
> I don't see any obvious downsides for slower peers or connections,
> but it would be prudent to test this extensively to ensure that
> those cases don't regress.
>
> Signed-off-by: Herbert Xu <herbert@...dor.apana.org.au>
>
> Cheers,
> --
> Visit Openswan at http://www.openswan.org/
> Email: Herbert Xu ~{PmV>HI~} <herbert@...dor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
> --
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 071e83a..39b629a 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -735,7 +735,7 @@ new_segment:
>               if (!(psize -= copy))
>                       goto out;
>
> -               if (skb->len < mss_now || (flags & MSG_OOB))
> +               if (skb->len < size_goal || (flags & MSG_OOB))
>                       continue;
>
>               if (forced_push(tp)) {
> @@ -981,7 +981,7 @@ new_segment:
>                       if ((seglen -= copy) == 0 && iovlen == 0)
>                               goto out;
>
> -                       if (skb->len < mss_now || (flags & MSG_OOB))
> +                       if (skb->len < size_goal || (flags & MSG_OOB))
>                               continue;
>
>                       if (forced_push(tp)) {
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ