lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <564F3D3D.7050004@axis.com>
Date:	Fri, 20 Nov 2015 16:33:17 +0100
From:	Niklas Cassel <niklas.cassel@...s.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
CC:	"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: network stream fairness

On 11/09/2015 05:07 PM, Eric Dumazet wrote:
> On Mon, 2015-11-09 at 16:53 +0100, Niklas Cassel wrote:
>> On 11/09/2015 04:50 PM, Eric Dumazet wrote:
>>> On Mon, 2015-11-09 at 16:41 +0100, Niklas Cassel wrote:
>>>> I have a ethernet driver for a 100 Mbps NIC.
>>>> The NIC has dedicated hardware for offloading.
>>>> The driver has implemented TSO, GSO and BQL.
>>>> Since the CPU on the SoC is rather weak, I'd rather
>>>> not increase the CPU load by turning off offloading.
>>>>
>>>> Since commit
>>>> 605ad7f184b6 ("tcp: refine TSO autosizing")
>>>>
>>>> the bandwidth is no longer fair between streams.
>>>> see output at the end of the mail, where I'm testing with 2 streams.
>>>>
>>>>
>>>> If I revert 605ad7f184b6 on 4.3, I get a stable 45 Mbps per stream.
>>>>
>>>> I can also use vanilla 4.3 and do:
>>>> echo 3000 > /sys/class/net/eth0/queues/tx-0/byte_queue_limits/limit_max
>>>> to also get a stable 45 Mbps per stream.
>>>>
>>>> My question is, am I supposed to set the BQL limit explicitly?
>>>> It is possible that I have missed something in my driver,
>>>> but my understanding is that the TCP stack sets and adjusts
>>>> the BQL limit automatically.
>>>>
>>>>
>>>> Perhaps the following info might help:
>>>>
>>>> After running iperf3 on vanilla 4.3:
>>>> /sys/class/net/eth0/queues/tx-0/byte_queue_limits/
>>>> limit 89908
>>>> limit_max 1879048192
>>>>
>>>> After running iperf3 on vanilla 4.3 + BQL explicitly set:
>>>> /sys/class/net/eth0/queues/tx-0/byte_queue_limits/
>>>> limit 3000
>>>> limit_max 3000
>>>>
>>>> After running iperf3 on 4.3 + 605ad7f184b6 reverted:
>>>> /sys/class/net/eth0/queues/tx-0/byte_queue_limits/
>>>> limit 8886
>>>> limit_max 1879048192
>>>>
>>>
>>> There is absolutely nothing ensuring fairness among multiple TCP flows.
>>>
>>> One TCP flow can very easily grab whole bandwidth for itself, there are
>>> numerous descriptions of this phenomena in various TCP studies. 
>>>
>>> This is why we have packet schedulers ;)
>>
>> Oh.. How stupid of me, I forgot to mention.. all of the measurements were
>> done with fq_codel.
> 
> Your numbers suggest a cwnd growth then, which might show a CC bug.
> 
> Please run the following when your iper3 runs on regular 4.3 kernel
> 
> for i in `seq 1 10`
> do
> ss -temoi dst 192.168.0.141
> sleep 1
> done
> 
> 

I've been able to reproduce this on a ARMv7, single core, 100 Mbps NIC.
Kernel vanilla 4.3, driver has BQL implemented, but is unfortunately not upstreamed.

ethtool -k eth0
Offload parameters for eth0:
rx-checksumming: off
tx-checksumming: on
scatter-gather: off
tcp segmentation offload: off
udp fragmentation offload: off
generic segmentation offload: off

ip addr show dev eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 00:40:8c:18:58:c8 brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.136/24 brd 192.168.0.255 scope global eth0
       valid_lft forever preferred_lft forever

# before iperf3 run
tc -s -d qdisc
qdisc noqueue 0: dev lo root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0 
qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn 
 Sent 21001 bytes 45 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0 
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0

sysctl net.ipv4.tcp_congestion_control
net.ipv4.tcp_congestion_control = cubic

# after iperf3 run
tc -s -d qdisc
qdisc noqueue 0: dev lo root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0 
qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms ecn 
 Sent 5618224754 bytes 3710914 pkt (dropped 0, overlimits 0 requeues 1) 
 backlog 0b 0p requeues 1 
  maxpacket 1514 drop_overlimit 0 new_flow_count 2 ecn_mark 0
  new_flows_len 0 old_flows_len 0

Note that it appears stable for 411 seconds before you can see the
congestion window growth. It appears that the amount of time you have
to wait before things go downhill varies a lot.
No switch was used between the server and client; they were connected directly.

For full iperf3 log and output from ss command, see attachment.

[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd

[  4] 411.00-412.00 sec  5.09 MBytes  42.7 Mbits/sec    0   22.6 KBytes       
[  6] 411.00-412.00 sec  5.14 MBytes  43.1 Mbits/sec    0   22.6 KBytes       
[SUM] 411.00-412.00 sec  10.2 MBytes  85.8 Mbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  4] 412.00-413.00 sec  5.12 MBytes  43.0 Mbits/sec    0   22.6 KBytes       
[  6] 412.00-413.00 sec  5.13 MBytes  43.0 Mbits/sec    0   22.6 KBytes       
[SUM] 412.00-413.00 sec  10.3 MBytes  86.0 Mbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  4] 413.00-414.00 sec  5.17 MBytes  43.4 Mbits/sec    0   22.6 KBytes       
[  6] 413.00-414.00 sec  5.07 MBytes  42.6 Mbits/sec    0   22.6 KBytes       
[SUM] 413.00-414.00 sec  10.2 MBytes  86.0 Mbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  4] 414.00-415.00 sec  5.11 MBytes  42.9 Mbits/sec    0   22.6 KBytes       
[  6] 414.00-415.00 sec  5.14 MBytes  43.1 Mbits/sec    0   22.6 KBytes       
[SUM] 414.00-415.00 sec  10.3 MBytes  86.0 Mbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  4] 415.00-416.00 sec  5.11 MBytes  42.9 Mbits/sec    0   32.5 KBytes       
[  6] 415.00-416.00 sec  5.15 MBytes  43.2 Mbits/sec    0   22.6 KBytes       
[SUM] 415.00-416.00 sec  10.3 MBytes  86.2 Mbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  4] 416.00-417.00 sec  6.18 MBytes  51.8 Mbits/sec    0   35.4 KBytes       
[  6] 416.00-417.00 sec  4.08 MBytes  34.3 Mbits/sec    0   22.6 KBytes       
[SUM] 416.00-417.00 sec  10.3 MBytes  86.1 Mbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  4] 417.00-418.00 sec  6.24 MBytes  52.4 Mbits/sec    0   35.4 KBytes       
[  6] 417.00-418.00 sec  4.01 MBytes  33.6 Mbits/sec    0   22.6 KBytes       
[SUM] 417.00-418.00 sec  10.3 MBytes  86.0 Mbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  4] 418.00-419.00 sec  6.28 MBytes  52.7 Mbits/sec    0   35.4 KBytes       
[  6] 418.00-419.00 sec  3.98 MBytes  33.4 Mbits/sec    0   22.6 KBytes       
[SUM] 418.00-419.00 sec  10.3 MBytes  86.0 Mbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  4] 419.00-420.00 sec  6.30 MBytes  52.8 Mbits/sec    0   35.4 KBytes       
[  6] 419.00-420.00 sec  3.96 MBytes  33.2 Mbits/sec    0   22.6 KBytes       
[SUM] 419.00-420.00 sec  10.3 MBytes  86.0 Mbits/sec    0             


Download attachment "iperf3-ss-logs.tar.gz" of type "application/gzip" (62161 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ