netdev - Re: limited network bandwidth with 3.2.x kernels

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1329896195.18384.83.camel@edumazet-laptop>
Date:	Wed, 22 Feb 2012 08:36:35 +0100
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Neal Cardwell <ncardwell@...gle.com>
Cc:	netdev@...r.kernel.org, David Miller <davem@...emloft.net>
Subject: Re: limited network bandwidth with 3.2.x kernels

Le mercredi 22 février 2012 à 00:51 -0500, Neal Cardwell a écrit :
> A few thoughts:
> 
> (1) Currently __tcp_grow_window has a very large negative impact due
>     to quantization. AFAICT from inspecting the code, the rcv_ssthresh
>     converges to the following output values given the following input
>     skb->truesize/skb->len input values:
> 
> truesize/len   rcv_ssthresh
> ------------   -------------
> <= 4/3         3/4 * tcp_space()
> <= 8/3         3/8 * sysctl_tcp_rmem[2]
> <= 16/3        3/16 * sysctl_tcp_rmem[2]
> <= 32/3        3/32 * sysctl_tcp_rmem[2]
> ...
> 
>   As a sanity-check of this table, note that in the report above where
>   we got tcpdump traces for the beginning and end of the connection,
>   the receive window converged to 338832, which was 2208 bytes above
>   (3/8)*sysctl_tcp_rmem[2] for his configuration of sysctl_tcp_rmem[2]
>   = 897664.
> 
>   It would be nice to get rid of this huge jump between truesize of
>   4/3*skb->len and 8/3*skb->len. Ideally we could make this
>   continuous?
> 

This skb->truesize/skb->len affair is suspect if you ask me.

We increase rcv_ssthresh if we receive a 'good skb', but we have no
guarantee of future skbs.

When we are close to the converged value, we might spend some time in
tcp_grow_window() and decide not to increase rcv_sshthresh

IMHO a better way would be to look at integration values
(sk->sk_rmem_alloc) to not increase rcv_sshthresh if socket receive
queue is full of 'bad skbs'

> (2) I don't think we want to scale the increment using truesize, but
>     rather calculate a cap using the truesize/skb->len ratio.
> 
> (3) We should use this cap to also cap the post-incremented value of
>     rcv_ssthresh, so the increment itself does not take us over the
>     target. (Again, note the example where the receive window ended up
>     about 2MSS above the target.)

Thats the 'oh we receive a good skb, lets add 2*MSS to rcv_sshthresh'
syndrom

> 
> (4) We should only request an ACK now if the rcv_ssthresh actually
>     increases.


Note that with your patch and 'good skb', rcv_ssthresh increases slower
than before (MSS increases instead of 2*MSS)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html