netdev - Re: network stream fairness

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5655ADC7.3040606@axis.com>
Date:	Wed, 25 Nov 2015 13:47:03 +0100
From:	Niklas Cassel <niklas.cassel@...s.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
CC:	"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: network stream fairness

On 11/20/2015 07:16 PM, Eric Dumazet wrote:
> On Fri, 2015-11-20 at 16:33 +0100, Niklas Cassel wrote:
> 
>> I've been able to reproduce this on a ARMv7, single core, 100 Mbps NIC.
>> Kernel vanilla 4.3, driver has BQL implemented, but is unfortunately not upstreamed.
>>
>> ethtool -k eth0
>> Offload parameters for eth0:
>> rx-checksumming: off
>> tx-checksumming: on
>> scatter-gather: off
>> tcp segmentation offload: off
>> udp fragmentation offload: off
>> generic segmentation offload: off
>>
>> ip addr show dev eth0
>> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
>>     link/ether 00:40:8c:18:58:c8 brd ff:ff:ff:ff:ff:ff
>>     inet 192.168.0.136/24 brd 192.168.0.255 scope global eth0
>>        valid_lft forever preferred_lft forever
>>
>> # before iperf3 run
>> tc -s -d qdisc
>> qdisc noqueue 0: dev lo root refcnt 2 
>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
>>  backlog 0b 0p requeues 0 
>> qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn 
>>  Sent 21001 bytes 45 pkt (dropped 0, overlimits 0 requeues 0) 
>>  backlog 0b 0p requeues 0 
>>   maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
>>   new_flows_len 0 old_flows_len 0
>>
>> sysctl net.ipv4.tcp_congestion_control
>> net.ipv4.tcp_congestion_control = cubic
>>
>> # after iperf3 run
>> tc -s -d qdisc
>> qdisc noqueue 0: dev lo root refcnt 2 
>>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
>>  backlog 0b 0p requeues 0 
>> qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn 
>>  Sent 5618224754 bytes 3710914 pkt (dropped 0, overlimits 0 requeues 1) 
>>  backlog 0b 0p requeues 1 
>>   maxpacket 1514 drop_overlimit 0 new_flow_count 2 ecn_mark 0
>>   new_flows_len 0 old_flows_len 0
>>
>> Note that it appears stable for 411 seconds before you can see the
>> congestion window growth. It appears that the amount of time you have
>> to wait before things go downhill varies a lot.
>> No switch was used between the server and client; they were connected directly.
> 
> Hi Niklas
> 
> Your results seem to show there is no special issue ;)
> 
> With TSO off and GSO off, there is no way a 'TSO autosizing' patch would
> have any effect, since this code path is not taken.

You are right of course.
The arm unit uses a completely different hardware and driver (without TSO),
and was be unfair even before your patch,
so those measurements should be ignored.



The mips unit (with TSO), started being unfair after the patch in question.

With TSO off and GSO off, iperf streams are fair within 2 seconds,
and stay being fair for as long as I have tested.

With TSO on and GSO on, iperf streams never converge,
usually one stream around 60 Mbps and one at 30 Mbps.

With TSO on and GSO on, and also calling netif_set_gso_max_size(netdev, 16384)
in the driver, things appear to be working slightly better.
The problem remaining now is that it takes about 40 seconds before the streams
are fair in iperf. Once they are fair, they appear to stay fair forever.

Looking at the logs, it appears that we start out with 2 cwnds about the same size,
one of the windows grows to about twice the size of the other window,
then at ~40 seconds, the smaller window has grown to the same size as the
other window. (See attached logs.)

All tests were done with sch_fq.



It might be that our traffic is now shaped in a way that TCP cubic does not
handle that well, and therefore takes so long to converge.
Is there anything we can do (except turning off TSO) to make it converge faster?

E.g., is there any way to can tune TCP cubic to be more aggressive?

(SO_MAX_PACING_RATE would require modifying a lot of different user-space
programs, so that is not really an option :( )

Thanks for all your help!

> 
> You have to wait 400 seconds before getting into a mode where one of the
> flow gets bigger cwnd (25 instead of 16), and then TCP cubic simply
> shows typical unfairness ...
> 
> If you absolutely need to guarantee a given throughput per flow, you
> might consider using fq packet scheduler and SO_MAX_PACING_RATE socket
> option.
> 
> Thanks !
> 
> 


Download attachment "iperf_ss_logs.tar.gz" of type "application/gzip" (14851 bytes)