lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1414422089.16231.8.camel@edumazet-glaptop2.roam.corp.google.com>
Date:	Mon, 27 Oct 2014 08:01:29 -0700
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Toshiaki Makita <makita.toshiaki@....ntt.co.jp>
Cc:	netdev@...r.kernel.org, Herbert Xu <herbert@...dor.apana.org.au>,
	Eric Dumazet <edumazet@...gle.com>
Subject: Re: Poor UDP throughput with virtual devices and UFO

On Mon, 2014-10-27 at 19:29 +0900, Toshiaki Makita wrote:
> Hi,

...

> I wrote a patch to increase sk_wmem_alloc in skb_segment(), but I'm wondering
> if we can do this change since it has been this way for years and only TCP
> handles it so far (d6a4a1041176 "tcp: GSO should be TSQ friendly").

Thats probably because UFO is kind of strange : No NIC actually does UDP
segmentation.

> ----
> Subject: [PATCH net] gso: Inherit sk_wmem_alloc
> 
> Signed-off-by: Toshiaki Makita <makita.toshiaki@....ntt.co.jp>
> ---
>  net/core/skbuff.c      |  6 +++++-
>  net/ipv4/tcp_offload.c | 13 ++++---------
>  2 files changed, 9 insertions(+), 10 deletions(-)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index c16615b..29dc763 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -3020,7 +3020,7 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>  							    len, 0);
>  			SKB_GSO_CB(nskb)->csum_start =
>  			    skb_headroom(nskb) + doffset;
> -			continue;
> +			goto set_owner;
>  		}
>  
>  		nskb_frag = skb_shinfo(nskb)->frags;
> @@ -3092,6 +3092,10 @@ perform_csum_check:
>  			SKB_GSO_CB(nskb)->csum_start =
>  			    skb_headroom(nskb) + doffset;
>  		}
> +
> +set_owner:
> +		if (head_skb->sk)
> +			skb_set_owner_w(nskb, head_skb->sk);
>  	} while ((offset += len) < head_skb->len);
>  
>  	/* Some callers want to get the end of the list.
> diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
> index 5b90f2f..93758a8 100644
> --- a/net/ipv4/tcp_offload.c
> +++ b/net/ipv4/tcp_offload.c
> @@ -139,11 +139,8 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
>  			th->check = gso_make_checksum(skb, ~th->check);
>  
>  		seq += mss;
> -		if (copy_destructor) {
> +		if (copy_destructor)
>  			skb->destructor = gso_skb->destructor;
> -			skb->sk = gso_skb->sk;
> -			sum_truesize += skb->truesize;
> -		}
>  		skb = skb->next;
>  		th = tcp_hdr(skb);
>  
> @@ -157,11 +154,9 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
>  	 * is freed by GSO engine
>  	 */
>  	if (copy_destructor) {
> -		swap(gso_skb->sk, skb->sk);
> -		swap(gso_skb->destructor, skb->destructor);
> -		sum_truesize += skb->truesize;
> -		atomic_add(sum_truesize - gso_skb->truesize,
> -			   &skb->sk->sk_wmem_alloc);
> +		skb->destructor = gso_skb->destructor;
> +		gso_skb->destructor = NULL;
> +		atomic_sub(gso_skb->truesize, &skb->sk->sk_wmem_alloc);
>  	}
>  
>  	delta = htonl(oldlen + (skb_tail_pointer(skb) -


Please rewrite your patch to move the code out of tcp_gso_segment() into
skb_segment()

Look how I carefully avoided many atomic operations on
sk->sk_wmem_alloc, but how you removed it. :(

Alternative would be to use a single skb_set_owner_w() on the last
segment, and tweak its truesize to not corrupt sk->sk_wmem_alloc

d6a4a1041176 was needed for people using GSO=off TSO=off on a bonding
device, while best performance is reached with TSO=on so that
segmentation is performed later on the slave device.

Thanks


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ