[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <5DE64000-782A-492C-A653-7EB758D28283@bejarano.io>
Date: Mon, 26 May 2025 10:50:43 +0200
From: Ricard Bejarano <ricard@...arano.io>
To: Mika Westerberg <mika.westerberg@...ux.intel.com>
Cc: netdev@...r.kernel.org,
michael.jamet@...el.com,
YehezkelShB@...il.com,
andrew+netdev@...n.ch,
davem@...emloft.net,
edumazet@...gle.com,
kuba@...nel.org,
pabeni@...hat.com
Subject: Re: Poor thunderbolt-net interface performance when bridged
Hey, thanks again for looking into this.
Yes, these are 8th generation Intel NUCs with Thunderbolt 3, not 4. And yes, the
cable I have used so far is Thunderbolt "compatible" not "certified", and it
doesn't have the lightning logo[1].
I am not convinced, though.
Part I: Thunderbolt 3
---------------------
I first ran into this issue a few months ago with a set of 3 12/13th generation
Intel NUCs, each of which has 2 Thunderbolt 4 ports, directly connected to each
other so as to form a ring network. When hopping through one of them, bandwidth
dropped from ~16Gbps to ~5Mbps. Both in routing and bridging. These 3 NUCs are
in "production" so I didn't want to use them as my test bench. They are rocking
"Thunderbolt 4 certified" cables with the lightning logo[2].
I could justify running any one of the following disruptive tests if you think
they would be helpful:
Note: A is connected to B, B to C, and C to A (to form a ring).
1) Configure A and C to route to each other via B if the A<->C link is down,
then disconnect A<->C and run iperfs in all directions, like in [4.6].
If they run at ~16Gbps when hopping via B, then TB3 was (at least part of)
the problem; otherwise it must be something wrong with the driver.
I am very confident speed will drop when hopping via B, because this is how I
first came across this issue. I wanted nodes of the ring to use the other way
around if the direct path wasn't up, but that wasn't possible due to the huge
bandwidth drop.
2) Same as #1 but configure B to bridge both of its Thunderbolt interfaces.
3) While pulling the A<->C cable for running one of the above, test that cable
in the 8th gen test bench. This cable is known to run at ~16Gbps when
connecting A and C via their Thunderbolt 4 ports.
While very unlikely, if this somehow solves the red->purple bandwidth, then
we know the current cable was to blame.
These 12/13th gen NUCs are running non-upstream kernels, however, and while I
can justify playing around a bit with their connections, I can't justify pulling
them out of production to install upstream kernels and make them our test bench.
Do you think anyone of these tests would be helpful?
Part II: the cable
------------------
You also point to the cable as the likely culprit.
1) But then, why does iperf between red<->blue[4.6.1] show ~9Gbps both ways, but
red->blue->purple[4.6.3a] drops to ~5Mbps? If the cable were to blame,
wouldn't red->blue[4.6.1a] also drop to about the same?
2) Also, if the problem were the cable's bandwidth in the red->blue direction,
flipping the cable around should show a similar bandwidth drop in the (now)
blue->red direction, right?
I have tested this and it doesn't hold true, iperfs in all directions after
flipping the cable around gave about the same results as in [4.6], further
pointing at something else other than the cable itself.
I've attached the output of 'tblist -Av'. It shows negotiated speed at 10Gb/s in
both Rx/Tx, which lines up with the red<->blue iperf bandwidth tests of [4.6.1].
How shall we proceed?
I reckon all my statements about the 12/13th gen NUCs are anecdata and not as
scientific as my 8th gen NUC results, but I'm happy to perform any one of the
three tests above.
Thanks again,
Ricard Bejarano
--
[1] https://www.amazon.es/-/en/dp/B0C93G2M83
[2] https://www.amazon.es/-/en/dp/B095KSL2B9
Powered by blists - more mailing lists