netdev - Re: Issue with AMD Xilinx AXI Ethernet (xilinx_axienet) on MicroBlaze: Packets only received after some buffer is full

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <80e2a74d4fcfcc9b3423df13c68b1525a8c41f7f.camel@hazent.com>
Date: Thu, 03 Apr 2025 07:44:59 +0200
From: Álvaro "G. M." <alvaro.gamez@...ent.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: netdev@...r.kernel.org, Radhey Shyam Pandey <radhey.shyam.pandey@....com>
Subject: Re: Issue with AMD Xilinx AXI Ethernet (xilinx_axienet) on
 MicroBlaze: Packets only received after some buffer is full

Hi


On Wed, 2025-04-02 at 10:00 -0700, Jakub Kicinski wrote:
> +CC Radhey, maintainer of axienet

Thanks, I don't know why I didn't think of that.

So, I can provide a little more information and I definitely believe now there
are some issues with this driver.

> On Tue, 01 Apr 2025 12:52:15 +0200 Álvaro "G. M." wrote:
> > I guess I may have made some mistake in upgrading the DTS to the new format, although
> > I've tried the two available methods (either setting node "dmas" or using "axistream-connected"
> > property) and both methods result in the same boot messages and behavior.

This has happened not to be true, I'm sorry for the confusion. Using node "dmas"
enables use_dmaengine and produces the effect I explained: data is only received
after a 2^17 bytes buffer is filled. 

If I remove "dmas" entry and provide a "axistream-connected" one, things get a
little better (but see at the end for some DTS notes). In this mode, in which
dmaengine is not used but legacy DMA code inside axienet itself, tcpdump -vv
shows packets incoming at a normal rate. However, the system is not answering to
ARP requests:

00:02:37.800814 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.188.140.2 tell 10.188.139.1, length 46
00:02:38.801974 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.188.140.2 tell 10.188.139.1, length 46
00:02:39.804137 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.188.140.2 tell 10.188.139.1, length 46
00:02:40.806434 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.188.140.2 tell 10.188.139.1, length 46
00:02:41.808084 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.188.140.2 tell 10.188.139.1, length 46
00:02:42.810592 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.188.140.2 tell 10.188.139.1, length 46
00:02:43.813155 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.188.140.2 tell 10.188.139.1, length 46

Here's the normal answer for a second device running old 4.4.43 kernel
connected to the same switch:

00:21:12.057326 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.188.140.1 tell 10.188.139.1, length 46
00:21:12.057905 ARP, Ethernet (len 6), IPv4 (len 4), Reply 10.188.140.1 is-at 06:00:0a:bc:8c:01 (oui Unknown), length 28
00:21:13.059460 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.188.140.1 tell 10.188.139.1, length 46
00:21:13.060031 ARP, Ethernet (len 6), IPv4 (len 4), Reply 10.188.140.1 is-at 06:00:0a:bc:8c:01 (oui Unknown), length 28
00:21:14.060502 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.188.140.1 tell 10.188.139.1, length 46
00:21:14.061051 ARP, Ethernet (len 6), IPv4 (len 4), Reply 10.188.140.1 is-at 06:00:0a:bc:8c:01 (oui Unknown), length 28

The funny thing is that once I manually add arp entries in both my computer
and the embedded one, I can establish full TCP communication and iperf3 shows
a relatively nice speed, similar to the throughput I get with old 4.4.43 kernel.

# arp -s 10.188.139.1 f4:4d:ad:02:11:29
# iperf3 -c 10.188.139.1
Connecting to host 10.188.139.1, port 5201
[  5] local 10.188.140.2 port 55480 connected to 10.188.139.1 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.01   sec  3.63 MBytes  30.1 Mbits/sec    0    130 KBytes       
[  5]   1.01-2.01   sec  3.75 MBytes  31.5 Mbits/sec    0    130 KBytes       
[  5]   2.01-3.01   sec  3.63 MBytes  30.4 Mbits/sec    0    130 KBytes       
[  5]   3.01-4.01   sec  3.75 MBytes  31.4 Mbits/sec    0    130 KBytes       
[  5]   4.01-5.01   sec  3.75 MBytes  31.4 Mbits/sec    0    130 KBytes       
[  5]   5.01-6.01   sec  3.75 MBytes  31.5 Mbits/sec    0    130 KBytes       
[  5]   6.01-7.01   sec  3.75 MBytes  31.6 Mbits/sec    0    130 KBytes       
[  5]   7.01-7.75   sec  2.63 MBytes  29.5 Mbits/sec    0    130 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-7.75   sec  28.6 MBytes  31.0 Mbits/sec    0            sender
[  5]   0.00-7.75   sec  0.00 Bytes  0.00 bits/sec                  receiver
iperf3: interrupt - the client has terminated
# iperf3 -c 10.188.139.1 -R
Connecting to host 10.188.139.1, port 5201
Reverse mode, remote host 10.188.139.1 is sending
[  5] local 10.188.140.2 port 45582 connected to 10.188.139.1 port 5201
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-1.03   sec  5.13 MBytes  41.9 Mbits/sec                  
[  5]   1.03-2.03   sec  5.38 MBytes  44.8 Mbits/sec                  
[  5]   2.03-3.02   sec  5.38 MBytes  45.6 Mbits/sec                  
[  5]   3.02-4.02   sec  5.38 MBytes  45.2 Mbits/sec                  
[  5]   4.02-5.01   sec  5.38 MBytes  45.4 Mbits/sec                  
[  5]   5.01-5.30   sec  1.50 MBytes  43.2 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate
[  5]   0.00-5.30   sec  0.00 Bytes  0.00 bits/sec                  sender
[  5]   0.00-5.30   sec  28.1 MBytes  44.5 Mbits/sec                  receiver
iperf3: interrupt - the client has terminated

I had never seen a device able to fully stablish communication except for
replying to MAC requests, so I'm not sure what's happening here.


On the other hand, and since I don't know how to debug this ARP issue, I
went back to see if I could diagnose what's happening in DMA Engine mode,
so I peeked at the code and I saw an asymmetry between RX and TX, which
sounded good given that in dmaengine mode TX works perfectly (or so it seems)
and RX is heavily buffered. This asymmetry lies precisely on the number
of SG blocks and number of skb buffers. 

Both bd_nums are defined in the same way:
        lp->rx_bd_num = RX_BD_NUM_DEFAULT; // = 1024
        lp->tx_bd_num = TX_BD_NUM_DEFAULT; // = 128


But the skb ring size is defined in a different fashion:
        lp->tx_skb_ring = kcalloc(TX_BD_NUM_MAX, sizeof(*lp->tx_skb_ring), // = 4096
                                  GFP_KERNEL);
	...
        lp->rx_skb_ring = kcalloc(RX_BUF_NUM_DEFAULT, sizeof(*lp->rx_skb_ring), // = 128
                                  GFP_KERNEL);

So, for TX we allocate space for up to 4096 buffers but by default use 128.
For RX we allocate space for 128 buffers but somehow are setting 1024 as
the default bd number.

The fact that RX_BD_NUM_DEFAULT is used nowhere else is also a signal
that there was some mistake here, so I went and replaced all RX_BUF_NUM_DEFAULT
occurances with RX_BD_NUM_DEFAULT, so that both TX and RX skb rings
are declared and operated with using the same strategy:

  sed -i '/^#define/!s#RX_BUF_NUM_DEFAULT#RX_BD_NUM_MAX#g' xilinx_axienet_main.c

Doing this solved the buffering problem, although the system still doesn't reply
to ARP requests, and when I tried to run an iperf3 test after manually adding arp tables,
the kernel segfaulted (so I probably shouldn't have blindly 'sed' like that :)

# iperf3 -c 10.188.139.1
Connecting to host 10.188.139.1, port 5201
[  5] local 10.188.140.2 port 46356 connected to 10.188.139.1 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.01   sec   640 KBytes  5.18 Mbits/sec    3   84.8 KBykernel task_size exceed
Oops: Exception in kernel mode, sig: 11
CPU: 0 UID: 0 PID: 147 Comm: iperf3 Not tainted 6.13.8 #13
 Registers dump: mode=8269B900
 r1=00000000, r2=00000000, r3=00000000, r4=00000010
 r5=00000000, r6=000005F2, r7=FFFF7FFF, r8=00000000
 r9=00000000, r10=00000000, r11=00000000, r12=CF5FF24C
 r13=00000000, r14=C241AB70, r15=C0383EB8, r16=00000000
 r17=C0383EC0, r18=000005F0, r19=C10124A0, r20=480F8520
 r21=4831F960, r22=00000000, r23=00000000, r24=FFFFFFEA
 r25=C12BE0A8, r26=C12BE03C, r27=C12BE020, r28=00000122
 r29=00000100, r30=000065A2, r31=C120F780, rPC=C0383EC0
 msr=000046A2, ear=FFFFFFFA, esr=00000312, fsr=00000000
Kernel panic - not syncing: Aiee, killing interrupt handler!
---[ end Kernel panic - not syncing: Aiee, killing interrupt handler! ]---
tes       

I couldn't see what was wrong with new code, so I just went and replaced
the RX_BD_NUM_DEFAULT value from 1024 down to 128, so it's now the same size as
its TX counterpart, but the kernel segfaulted again when trying to measure
throughput. Sadly, my kernel debugging abilities are not much stronger than this,
so I'm stuck at this point but firmly believe there's something wrong here,
although I can't see what it is.

Any help will be greatly appreciated.


DTS NOTES:
Using old DMA code inside xilinx_axienet_main.c requires removing "dmas" entry
and add a reference to DMA device either via axistream-connected or by adding
resources manually to the node. Referring to the node linked by axistream-connected
requires a DMA node to exist, but its compatible string can't be xlnx,axi-dma-1.00.a,
because then AXI DMA driver will lock onto it and axienet will complain about
the device being busy. So my solution for this is to use a not compatible string.
As such, with the following DTS I can establish TCP connections as long as
ARP tables are manually entered:


axi_ethernet_0_dma: dma@...00000 {
	/* NOTE THE NOT */
	compatible = "notxlnx,axi-dma-1.00.a";
	#dma-cells = <1>;
	reg = <0x41e00000 0x10000>;
	interrupt-parent = <&microblaze_0_axi_intc>;
	interrupts = <7 1 8 1>;
	xlnx,addrwidth = <32>;  // Tamaño de dirección en bits
	xlnx,datawidth = <32>;
	xlnx,include-sg;
	xlnx,sg-length-width = <16>;
	xlnx,include-dre = <1>;
	xlnx,axistream-connected = <1>;
	xlnx,irq-delay = <1>;
	dma-channels = <2>;
	clock-names = "s_axi_lite_aclk", "m_axi_mm2s_aclk", "m_axi_s2mm_aclk", "m_axi_sg_aclk";
	clocks = <&clk_bus_0>, <&clk_bus_0>, <&clk_bus_0>, <&clk_bus_0>;
	dma-channel@...00000 {
		compatible = "xlnx,axi-dma-mm2s-channel";
		xlnx,include-dre = <1>;
		interrupts = <7 1>;
		xlnx,datawidth = <32>;
	};
	dma-channel@...00030 {
		compatible = "xlnx,axi-dma-s2mm-channel";
		xlnx,include-dre = <1>;
		interrupts = <8 1>;
		xlnx,datawidth = <32>;
	};
};
axi_ethernet_eth: ethernet@...00000 {
	compatible = "xlnx,axi-ethernet-1.00.a";
	reg = <0x40c00000 0x40000>;
	phy-handle = <&phy1>;
	interrupt-parent = <&microblaze_0_axi_intc>;
	interrupts = <3 0>;
	xlnx,rxmem = <0x1000>;
	max-speed = <100000>;
	phy-mode = "mii";
	xlnx,txcsum = <0x2>;
	xlnx,rxcsum = <0x2>;
	clock-names = "s_axi_lite_clk", "axis_clk", "ref_clk", "mgt_clk";
	clocks = <&clk_bus_0>, <&clk_bus_0>, <&clk_bus_0>, <&clk_bus_0>;
	axistream-connected = <&axi_ethernet_0_dma>;
	dma-names = "tx_chan0", "rx_chan0";
	mdio {
		#address-cells = <1>;
		#size-cells = <0>;
		phy1: ethernet-phy@1 {
			device_type = "ethernet-phy";
			reg = <1>;
		};
	};
};

So this mode of working would definitely NOT need AXI DMA, and this hack
with the compatible string should not be needed if the dependency with AXI DMA
was removed.

Best regards,

-- 
Álvaro G. M.