lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHo-OozZz4yx5iuoEWcq5vTJvJ0b_5bNbiQuc4PMw22jBZOxpg@mail.gmail.com>
Date:	Wed, 15 May 2013 20:58:28 -0700
From:	Maciej Żenczykowski <zenczykowski@...il.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	David Miller <davem@...emloft.net>,
	netdev <netdev@...r.kernel.org>,
	Tom Herbert <therbert@...gle.com>,
	Neal Cardwell <ncardwell@...gle.com>,
	Yuchung Cheng <ycheng@...gle.com>
Subject: Re: [PATCH] tcp: gso: do not generate out of order packets

I'd be worried that calling the destructor that many times would cause
performance problems
(and only call the destructor and do memory accounting on the last segment).

Could we instead move the queue mapping into the skb somehow instead?

On Wed, May 15, 2013 at 6:38 PM, Eric Dumazet <eric.dumazet@...il.com> wrote:
> From: Eric Dumazet <edumazet@...gle.com>
>
> GSO TCP handler has following issues :
>
> 1) ooo_okay from original GSO packet is duplicated to all segments
> 2) segments (but the last one) are orphaned, so transmit path can not
> get transmit queue number from the socket. This happens if GSO
> segmentation is done before stacked device for example.
>
> Result is we can send packets from a given TCP flow to different TX
> queues (if using multiqueue NICS). This generates OOO problems and
> spurious SACK & retransmits.
>
> Fix this by keeping socket pointer set for all segments.
>
> This means that every segment must also have a destructor, and the
> original gso skb truesize must be split on all segments, to keep
> precise sk->sk_wmem_alloc accounting.
>
> Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> Cc: Maciej Żenczykowski <maze@...gle.com>
> Cc: Tom Herbert <therbert@...gle.com>
> Cc: Neal Cardwell <ncardwell@...gle.com>
> Cc: Yuchung Cheng <ycheng@...gle.com>
> ---
>  net/ipv4/tcp.c |   22 +++++++++++++++++++++-
>  1 file changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index dcb116d..0b6276f 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -2887,6 +2887,7 @@ struct sk_buff *tcp_tso_segment(struct sk_buff *skb,
>         unsigned int mss;
>         struct sk_buff *gso_skb = skb;
>         __sum16 newcheck;
> +       bool ooo_okay, copy_destructor;
>
>         if (!pskb_may_pull(skb, sizeof(*th)))
>                 goto out;
> @@ -2927,10 +2928,18 @@ struct sk_buff *tcp_tso_segment(struct sk_buff *skb,
>                 goto out;
>         }
>
> +       copy_destructor = gso_skb->destructor == tcp_wfree;
> +       ooo_okay = gso_skb->ooo_okay;
> +       /* All segments but the first should have ooo_okay cleared */
> +       skb->ooo_okay = 0;
> +
>         segs = skb_segment(skb, features);
>         if (IS_ERR(segs))
>                 goto out;
>
> +       /* Only first segment might have ooo_okay set */
> +       segs->ooo_okay = ooo_okay;
> +
>         delta = htonl(oldlen + (thlen + mss));
>
>         skb = segs;
> @@ -2950,6 +2959,17 @@ struct sk_buff *tcp_tso_segment(struct sk_buff *skb,
>                                                     thlen, skb->csum));
>
>                 seq += mss;
> +               if (copy_destructor) {
> +                       skb->destructor = gso_skb->destructor;
> +                       skb->sk = gso_skb->sk;
> +                       /* {tcp|sock}_wfree() use exact truesize accounting :
> +                        * sum(skb->truesize) MUST be exactly be gso_skb->truesize
> +                        * So we account mss bytes of 'true size' for each segment.
> +                        * The last segment will contain the remaining.
> +                        */
> +                       skb->truesize = mss;
> +                       gso_skb->truesize -= mss;
> +               }
>                 skb = skb->next;
>                 th = tcp_hdr(skb);
>
> @@ -2962,7 +2982,7 @@ struct sk_buff *tcp_tso_segment(struct sk_buff *skb,
>          * is freed at TX completion, and not right now when gso_skb
>          * is freed by GSO engine
>          */
> -       if (gso_skb->destructor == tcp_wfree) {
> +       if (copy_destructor) {
>                 swap(gso_skb->sk, skb->sk);
>                 swap(gso_skb->destructor, skb->destructor);
>                 swap(gso_skb->truesize, skb->truesize);
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ