[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c1ac6822-a890-45cd-b710-38f9c7114272@lunn.ch>
Date: Tue, 27 May 2025 17:02:51 +0200
From: Andrew Lunn <andrew@...n.ch>
To: Ricard Bejarano <ricard@...arano.io>
Cc: Mika Westerberg <mika.westerberg@...ux.intel.com>,
netdev@...r.kernel.org, michael.jamet@...el.com,
YehezkelShB@...il.com, andrew+netdev@...n.ch, davem@...emloft.net,
edumazet@...gle.com, kuba@...nel.org, pabeni@...hat.com
Subject: Re: Poor thunderbolt-net interface performance when bridged
On Tue, May 27, 2025 at 04:25:16PM +0200, Ricard Bejarano wrote:
> Ok, I was going mad trying to find CRC stats for blue's tb0.
>
> 'ethtool -S' returns "no stats available".
> 'netstat' and 'ss' aren't much better than 'ip -s link show dev'.
> CRC verification is done by the driver so 'tcpdump' won't see those (I do see loss, however).
>
> But, I do see the thunderbolt-net driver exposes rx_crc_errors.
> And then I found 'ip -s -s' (double -s):
>
> root@...e:~# ip -s -s link show dev tb0
> 5: tb0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master br0 state UP mode DEFAULT group default qlen 1000
> link/ether 02:70:19:dc:92:96 brd ff:ff:ff:ff:ff:ff
> RX: bytes packets errors dropped missed mcast
> 9477191497 6360635 16763 0 0 0
> RX errors: length crc frame fifo overrun
> 0 16763 0 0 0
> TX: bytes packets errors dropped carrier collsns
> 8861 100 0 0 0 0
> TX errors: aborted fifo window heartbt transns
> 0 0 0 0 2
> root@...e:~#
>
> Bingo! CRC errors.
>
> How can we proceed?
static bool tbnet_check_frame(struct tbnet *net, const struct tbnet_frame *tf,
const struct thunderbolt_ip_frame_header *hdr)
{
u32 frame_id, frame_count, frame_size, frame_index;
unsigned int size;
if (tf->frame.flags & RING_DESC_CRC_ERROR) {
net->stats.rx_crc_errors++;
return false;
So it looks like CRC is offloaded to the hardware. I've never look at
thunderbolt, so i have no idea what its frame structure looks like.
Maybe hack out this test, and allow the corrupt frame to be
received. Then look at it with Wireshark and see if you can figure out
what is wrong with it. Knowing what is wrong with it might allow you
to backtrack to where it gets mangled.
Looking further into that function, it seems like one Ethernet frame
can be split over multiple TB frames. When it is, the skbuf has
multiple fragments. Wild guess: Copying such fragments to user space
works, that is a core networking thing, heavily used, well tested. But
when the skbuf is bridged and sent out another interface, that code
does not correctly handle the fragments? You might want to stare at
tbnet_start_xmit() and see if you can see something wrong with the
code around 'frag'.
Andrew
Powered by blists - more mailing lists