[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <29E840A2-D4DB-4A49-88FE-F97303952638@bejarano.io>
Date: Mon, 26 May 2025 21:34:19 +0200
From: Ricard Bejarano <ricard@...arano.io>
To: Andrew Lunn <andrew@...n.ch>
Cc: Mika Westerberg <mika.westerberg@...ux.intel.com>,
netdev@...r.kernel.org,
michael.jamet@...el.com,
YehezkelShB@...il.com,
andrew+netdev@...n.ch,
davem@...emloft.net,
edumazet@...gle.com,
kuba@...nel.org,
pabeni@...hat.com
Subject: Re: Poor thunderbolt-net interface performance when bridged
Hey Andrew, thanks for chiming in.
> Do the interfaces provide statistics? ethtool -S. Where is the packet
> loss happening?
root@...e:~# ethtool -S tb0
no stats available
root@...e:~# ip -s link show tb0
6: tb0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master br0 state UP mode DEFAULT group default qlen 1000
link/ether 02:70:19:dc:92:96 brd ff:ff:ff:ff:ff:ff
RX: bytes packets errors dropped missed mcast
11209729 71010 0 0 0 0
TX: bytes packets errors dropped carrier collsns
624522843 268941 0 0 0 0
root@...e:~#
root@red:~# ethtool -S tb0
no stats available
root@red:~# ip -s link show tb0
8: tb0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master br0 state UP mode DEFAULT group default qlen 1000
link/ether 02:5f:d6:57:71:93 brd ff:ff:ff:ff:ff:ff
RX: bytes packets errors dropped missed mcast
624522843 320623 0 0 0 0
TX: bytes packets errors dropped carrier collsns
11209729 71007 0 0 0 0
root@red:~#
It seems like everything is fine, but I noticed iperf3 red->blue does show a lot
of TCP retries, at least relative to red->blue->purple:
root@red:~# iperf3 -c 10.0.0.2 # blue
Connecting to host 10.0.0.2, port 5201
[ 5] local 10.0.0.1 port 34858 connected to 10.0.0.2 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 1016 MBytes 8.52 Gbits/sec 10 1.41 MBytes
[ 5] 1.00-2.00 sec 1.06 GBytes 9.07 Gbits/sec 0 1.89 MBytes
[ 5] 2.00-3.00 sec 1.06 GBytes 9.12 Gbits/sec 0 2.15 MBytes
[ 5] 3.00-4.00 sec 1.07 GBytes 9.22 Gbits/sec 0 2.18 MBytes
[ 5] 4.00-5.00 sec 1.08 GBytes 9.27 Gbits/sec 0 2.22 MBytes
[ 5] 5.00-6.00 sec 1.08 GBytes 9.24 Gbits/sec 0 2.24 MBytes
[ 5] 6.00-7.00 sec 1.08 GBytes 9.25 Gbits/sec 0 2.25 MBytes
[ 5] 7.00-8.00 sec 1.09 GBytes 9.32 Gbits/sec 0 2.26 MBytes
[ 5] 8.00-9.00 sec 1.09 GBytes 9.36 Gbits/sec 0 2.27 MBytes
[ 5] 9.00-10.00 sec 1.08 GBytes 9.29 Gbits/sec 0 2.27 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 10.7 GBytes 9.17 Gbits/sec 10 sender
[ 5] 0.00-10.00 sec 10.7 GBytes 9.16 Gbits/sec receiver
root@red:~# iperf3 -c 10.0.0.3 # purple
Connecting to host 10.0.0.3, port 5201
[ 5] local 10.0.0.1 port 38894 connected to 10.0.0.3 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 384 KBytes 3.14 Mbits/sec 53 2.83 KBytes
[ 5] 1.00-2.00 sec 640 KBytes 5.24 Mbits/sec 42 4.24 KBytes
[ 5] 2.00-3.00 sec 640 KBytes 5.24 Mbits/sec 48 2.83 KBytes
[ 5] 3.00-4.00 sec 768 KBytes 6.29 Mbits/sec 56 2.83 KBytes
[ 5] 4.00-5.00 sec 512 KBytes 4.19 Mbits/sec 62 2.83 KBytes
[ 5] 5.00-6.00 sec 640 KBytes 5.24 Mbits/sec 50 2.83 KBytes
[ 5] 6.00-7.00 sec 640 KBytes 5.24 Mbits/sec 56 2.83 KBytes
[ 5] 7.00-8.00 sec 768 KBytes 6.29 Mbits/sec 48 2.83 KBytes
[ 5] 8.00-9.00 sec 512 KBytes 4.19 Mbits/sec 52 4.24 KBytes
[ 5] 9.00-10.00 sec 640 KBytes 5.24 Mbits/sec 48 2.83 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 6.00 MBytes 5.03 Mbits/sec 515 sender
[ 5] 0.00-10.00 sec 6.00 MBytes 5.03 Mbits/sec receiver
root@red:~#
Now, where do those retries happen? I've made PCAPs of the two iperf3 tests
above, I'll be looking into those and share once I've stripped their payloads,
otherwise they're too large for email.
> Is your iperf testing with TCP or UDP? A small amount of packet loss
> will cause TCP to back off a lot. Also, if the reverse direction is
> getting messed up, ACKs are getting lost, TCP will also stall.
>
> Maybe try a UDP stream, say 500Mbs. What is the packet loss? Try the
> reverse direction, what is the packet loss. Then try --bidir, so you
> get both directions at the same time.
I was using TCP (default). The UDP results are very interesting indeed:
1. red to blue, 110Mbps
-----------------------
root@red:~# iperf3 -c 10.0.0.2 -u -t 5 -b 110M # blue
Connecting to host 10.0.0.2, port 5201
[ 5] local 10.0.0.1 port 33079 connected to 10.0.0.2 port 5201
[ ID] Interval Transfer Bitrate Total Datagrams
[ 5] 0.00-1.00 sec 13.1 MBytes 110 Mbits/sec 9488
[ 5] 1.00-2.00 sec 13.1 MBytes 110 Mbits/sec 9496
[ 5] 2.00-3.00 sec 13.1 MBytes 110 Mbits/sec 9496
[ 5] 3.00-4.00 sec 13.1 MBytes 110 Mbits/sec 9495
[ 5] 4.00-5.00 sec 13.1 MBytes 110 Mbits/sec 9496
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
[ 5] 0.00-5.00 sec 65.6 MBytes 110 Mbits/sec 0.000 ms 0/47471 (0%) sender
[ 5] 0.00-5.00 sec 65.6 MBytes 110 Mbits/sec 0.026 ms 0/47471 (0%) receiver
Good, as expected.
2. red to blue, 1.1Gbps
-----------------------
root@red:~# iperf3 -c 10.0.0.2 -u -t 5 -b 1100M # blue
Connecting to host 10.0.0.2, port 5201
[ 5] local 10.0.0.1 port 35966 connected to 10.0.0.2 port 5201
[ ID] Interval Transfer Bitrate Total Datagrams
[ 5] 0.00-1.00 sec 131 MBytes 1.10 Gbits/sec 94891
[ 5] 1.00-2.00 sec 131 MBytes 1.10 Gbits/sec 94957
[ 5] 2.00-3.00 sec 131 MBytes 1.10 Gbits/sec 94961
[ 5] 3.00-4.00 sec 131 MBytes 1.10 Gbits/sec 94960
[ 5] 4.00-5.00 sec 131 MBytes 1.10 Gbits/sec 94951
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
[ 5] 0.00-5.00 sec 656 MBytes 1.10 Gbits/sec 0.000 ms 0/474720 (0%) sender
[ 5] 0.00-5.00 sec 567 MBytes 950 Mbits/sec 0.003 ms 64437/474720 (14%) receiver
Interesting. Rerunning it leads similar results. Why do we have ~12-14% loss?
3. red to blue, 910Mbps
-----------------------
root@red:~# iperf3 -c 10.0.0.2 -u -t 5 -b 910M # blue
Connecting to host 10.0.0.2, port 5201
[ 5] local 10.0.0.1 port 35073 connected to 10.0.0.2 port 5201
[ ID] Interval Transfer Bitrate Total Datagrams
[ 5] 0.00-1.00 sec 108 MBytes 909 Mbits/sec 78498
[ 5] 1.00-2.00 sec 108 MBytes 910 Mbits/sec 78557
[ 5] 2.00-3.00 sec 108 MBytes 910 Mbits/sec 78556
[ 5] 3.00-4.00 sec 108 MBytes 910 Mbits/sec 78557
[ 5] 4.00-5.00 sec 108 MBytes 910 Mbits/sec 78556
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
[ 5] 0.00-5.00 sec 542 MBytes 910 Mbits/sec 0.000 ms 0/392724 (0%) sender
[ 5] 0.00-5.00 sec 349 MBytes 585 Mbits/sec 0.002 ms 140008/392724 (36%) receiver
root@red:~# iperf3 -c 10.0.0.2 -u -t 5 -b 910M # blue
Connecting to host 10.0.0.2, port 5201
[ 5] local 10.0.0.1 port 46225 connected to 10.0.0.2 port 5201
[ ID] Interval Transfer Bitrate Total Datagrams
[ 5] 0.00-1.00 sec 108 MBytes 909 Mbits/sec 78500
[ 5] 1.00-2.00 sec 108 MBytes 910 Mbits/sec 78555
[ 5] 2.00-3.00 sec 108 MBytes 910 Mbits/sec 78557
[ 5] 3.00-4.00 sec 108 MBytes 910 Mbits/sec 78557
[ 5] 4.00-5.00 sec 108 MBytes 910 Mbits/sec 78556
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
[ 5] 0.00-5.00 sec 542 MBytes 910 Mbits/sec 0.000 ms 0/392725 (0%) sender
[ 5] 0.00-5.00 sec 486 MBytes 816 Mbits/sec 0.005 ms 40598/392725 (10%) receiver
root@red:~# iperf3 -c 10.0.0.2 -u -t 5 -b 910M # blue
Connecting to host 10.0.0.2, port 5201
[ 5] local 10.0.0.1 port 33329 connected to 10.0.0.2 port 5201
[ ID] Interval Transfer Bitrate Total Datagrams
[ 5] 0.00-1.00 sec 108 MBytes 909 Mbits/sec 78504
[ 5] 1.00-2.00 sec 108 MBytes 910 Mbits/sec 78549
[ 5] 2.00-3.00 sec 108 MBytes 910 Mbits/sec 78563
[ 5] 3.00-4.00 sec 108 MBytes 910 Mbits/sec 78557
[ 5] 4.00-5.00 sec 108 MBytes 910 Mbits/sec 78554
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
[ 5] 0.00-5.00 sec 542 MBytes 910 Mbits/sec 0.000 ms 0/392727 (0%) sender
[ 5] 0.00-5.00 sec 538 MBytes 902 Mbits/sec 0.003 ms 3144/392727 (0.8%) receiver
root@red:~#
These three tests at 910Mbps show major loss variance in red->blue traffic.
We know this loss doesn't appear at 110Mbps (test #1, after several reruns),
and some rough binary search for the inflection point leads to ~250-300Mbps.
Where does this loss come from though?
4. red to blue, 10.1Gbps
------------------------
root@red:~# iperf3 -c 10.0.0.2 -u -t 5 -b 11000M # blue
Connecting to host 10.0.0.2, port 5201
[ 5] local 10.0.0.1 port 59115 connected to 10.0.0.2 port 5201
[ ID] Interval Transfer Bitrate Total Datagrams
[ 5] 0.00-1.00 sec 555 MBytes 4.65 Gbits/sec 401782
[ 5] 1.00-2.00 sec 558 MBytes 4.68 Gbits/sec 404136
[ 5] 2.00-3.00 sec 556 MBytes 4.66 Gbits/sec 402648
[ 5] 3.00-4.00 sec 556 MBytes 4.66 Gbits/sec 402634
[ 5] 4.00-5.00 sec 556 MBytes 4.66 Gbits/sec 402374
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
[ 5] 0.00-5.00 sec 2.72 GBytes 4.66 Gbits/sec 0.000 ms 0/2013574 (0%) sender
[ 5] 0.00-5.00 sec 2.51 GBytes 4.31 Gbits/sec 0.002 ms 154525/2013574 (7.7%) receiver
First, sender bitrate doesn't go beyond ~4.68Gbps, okay.
Second, since receiver bitrate goes up from test #2's ~950Mbps to ~4.31Gbps, we
know that the loss we saw there is not because the receiver can't take it, so
maybe the Thunderbolt link is to blame for the loss?
5. red to purple, 110Mbps
-------------------------
root@red:~# iperf3 -c 10.0.0.3 -u -t 5 -b 110M # purple
Connecting to host 10.0.0.3, port 5201
[ 5] local 10.0.0.1 port 48081 connected to 10.0.0.3 port 5201
[ ID] Interval Transfer Bitrate Total Datagrams
[ 5] 0.00-1.00 sec 13.1 MBytes 110 Mbits/sec 9488
[ 5] 1.00-2.00 sec 13.1 MBytes 110 Mbits/sec 9496
[ 5] 2.00-3.00 sec 13.1 MBytes 110 Mbits/sec 9496
[ 5] 3.00-4.00 sec 13.1 MBytes 110 Mbits/sec 9496
[ 5] 4.00-5.00 sec 13.1 MBytes 110 Mbits/sec 9495
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
[ 5] 0.00-5.00 sec 65.6 MBytes 110 Mbits/sec 0.000 ms 0/47471 (0%) sender
[ 5] 0.00-5.00 sec 65.6 MBytes 110 Mbits/sec 0.029 ms 0/47471 (0%) receiver
INTERESTING!
This is the first time we're going beyond ~5Mbps in the blue->purple direction,
meaning, there is something up with TCP.
And, if we put this together with test #4, it would make sense that:
1. (unusual) Thunderbolt interface loss causes TCP retries;
2. TCP retries cause TCP backoff;
3. TCP bandwidth drops to ~5Mbps.
6. red to purple, 950Mbps & 990Mbps
-----------------------------------
root@red:~# iperf3 -c 10.0.0.3 -u -t 5 -b 950M # purple
Connecting to host 10.0.0.3, port 5201
[ 5] local 10.0.0.1 port 57640 connected to 10.0.0.3 port 5201
[ ID] Interval Transfer Bitrate Total Datagrams
[ 5] 0.00-1.00 sec 113 MBytes 949 Mbits/sec 81956
[ 5] 1.00-2.00 sec 113 MBytes 950 Mbits/sec 82010
[ 5] 2.00-3.00 sec 113 MBytes 950 Mbits/sec 82010
[ 5] 3.00-4.00 sec 113 MBytes 950 Mbits/sec 82006
[ 5] 4.00-5.00 sec 113 MBytes 950 Mbits/sec 82010
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
[ 5] 0.00-5.00 sec 566 MBytes 950 Mbits/sec 0.000 ms 0/409992 (0%) sender
[ 5] 0.00-5.00 sec 566 MBytes 949 Mbits/sec 0.009 ms 0/409643 (0%) receiver
root@red:~# iperf3 -c 10.0.0.3 -u -t 5 -b 990M # purple
Connecting to host 10.0.0.3, port 5201
[ 5] local 10.0.0.1 port 49666 connected to 10.0.0.3 port 5201
[ ID] Interval Transfer Bitrate Total Datagrams
[ 5] 0.00-1.00 sec 118 MBytes 989 Mbits/sec 85407
[ 5] 1.00-2.00 sec 118 MBytes 990 Mbits/sec 85459
[ 5] 2.00-3.00 sec 118 MBytes 990 Mbits/sec 85467
[ 5] 3.00-4.00 sec 118 MBytes 990 Mbits/sec 85458
[ 5] 4.00-5.00 sec 118 MBytes 990 Mbits/sec 85466
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams
[ 5] 0.00-5.00 sec 590 MBytes 990 Mbits/sec 0.000 ms 0/427257 (0%) sender
[ 5] 0.00-5.00 sec 566 MBytes 949 Mbits/sec 0.022 ms 13640/423358 (3.2%) receiver
root@red:~#
INTERESTING!
First, we're reaching line speed in the red->blue->purple direction for the
first time.
Second, we're doing so without any loss, which is weird given the 12-14% loss we
saw in tests #2, #3 and #4.
What's your reading of all of this?
Thank you all again,
Ricard Bejarano
Powered by blists - more mailing lists