netdev - RE: Issue with AMD Xilinx AXI Ethernet (xilinx_axienet) on MicroBlaze: Packets only received after some buffer is full

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
 <BL3PR12MB65718DC2E6304971E73C5567C9AE2@BL3PR12MB6571.namprd12.prod.outlook.com>
Date: Thu, 3 Apr 2025 13:58:42 +0000
From: "Gupta, Suraj" <Suraj.Gupta2@....com>
To: Álvaro G. M. <alvaro.gamez@...ent.com>
CC: "netdev@...r.kernel.org" <netdev@...r.kernel.org>, Jakub Kicinski
	<kuba@...nel.org>, "Pandey, Radhey Shyam" <radhey.shyam.pandey@....com>
Subject: RE: Issue with AMD Xilinx AXI Ethernet (xilinx_axienet) on
 MicroBlaze: Packets only received after some buffer is full

[AMD Official Use Only - AMD Internal Distribution Only]

Hi Alvaro,
> -----Original Message-----
> From: Álvaro G. M. <alvaro.gamez@...ent.com>
> Sent: Thursday, April 3, 2025 11:15 AM
> To: Jakub Kicinski <kuba@...nel.org>
> Cc: netdev@...r.kernel.org; Pandey, Radhey Shyam
> <radhey.shyam.pandey@....com>
> Subject: Re: Issue with AMD Xilinx AXI Ethernet (xilinx_axienet) on MicroBlaze:
> Packets only received after some buffer is full
>
> Caution: This message originated from an External Source. Use proper caution
> when opening attachments, clicking links, or responding.
>
>
> Hi
>
>
> On Wed, 2025-04-02 at 10:00 -0700, Jakub Kicinski wrote:
> > +CC Radhey, maintainer of axienet
>
> Thanks, I don't know why I didn't think of that.
>
> So, I can provide a little more information and I definitely believe now there are some
> issues with this driver.
>
> > On Tue, 01 Apr 2025 12:52:15 +0200 Álvaro "G. M." wrote:
> > > I guess I may have made some mistake in upgrading the DTS to the new
> > > format, although I've tried the two available methods (either setting node "dmas"
> or using "axistream-connected"
> > > property) and both methods result in the same boot messages and behavior.
>
> This has happened not to be true, I'm sorry for the confusion. Using node "dmas"
> enables use_dmaengine and produces the effect I explained: data is only received
> after a 2^17 bytes buffer is filled.
>
> If I remove "dmas" entry and provide a "axistream-connected" one, things get a little
> better (but see at the end for some DTS notes). In this mode, in which dmaengine is
> not used but legacy DMA code inside axienet itself, tcpdump -vv shows packets
> incoming at a normal rate. However, the system is not answering to ARP requests:
>
Could you please check ifconfig for any packet drop/error?

> 00:02:37.800814 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.188.140.2
> tell 10.188.139.1, length 46
> 00:02:38.801974 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.188.140.2
> tell 10.188.139.1, length 46
> 00:02:39.804137 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.188.140.2
> tell 10.188.139.1, length 46
> 00:02:40.806434 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.188.140.2
> tell 10.188.139.1, length 46
> 00:02:41.808084 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.188.140.2
> tell 10.188.139.1, length 46
> 00:02:42.810592 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.188.140.2
> tell 10.188.139.1, length 46
> 00:02:43.813155 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.188.140.2
> tell 10.188.139.1, length 46
>
> Here's the normal answer for a second device running old 4.4.43 kernel connected to
> the same switch:
>
> 00:21:12.057326 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.188.140.1
> tell 10.188.139.1, length 46
> 00:21:12.057905 ARP, Ethernet (len 6), IPv4 (len 4), Reply 10.188.140.1 is-at
> 06:00:0a:bc:8c:01 (oui Unknown), length 28
> 00:21:13.059460 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.188.140.1
> tell 10.188.139.1, length 46
> 00:21:13.060031 ARP, Ethernet (len 6), IPv4 (len 4), Reply 10.188.140.1 is-at
> 06:00:0a:bc:8c:01 (oui Unknown), length 28
> 00:21:14.060502 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 10.188.140.1
> tell 10.188.139.1, length 46
> 00:21:14.061051 ARP, Ethernet (len 6), IPv4 (len 4), Reply 10.188.140.1 is-at
> 06:00:0a:bc:8c:01 (oui Unknown), length 28
>
> The funny thing is that once I manually add arp entries in both my computer and the
> embedded one, I can establish full TCP communication and iperf3 shows a relatively
> nice speed, similar to the throughput I get with old 4.4.43 kernel.
>
> # arp -s 10.188.139.1 f4:4d:ad:02:11:29
> # iperf3 -c 10.188.139.1
> Connecting to host 10.188.139.1, port 5201 [  5] local 10.188.140.2 port 55480
> connected to 10.188.139.1 port 5201
> [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
> [  5]   0.00-1.01   sec  3.63 MBytes  30.1 Mbits/sec    0    130 KBytes
> [  5]   1.01-2.01   sec  3.75 MBytes  31.5 Mbits/sec    0    130 KBytes
> [  5]   2.01-3.01   sec  3.63 MBytes  30.4 Mbits/sec    0    130 KBytes
> [  5]   3.01-4.01   sec  3.75 MBytes  31.4 Mbits/sec    0    130 KBytes
> [  5]   4.01-5.01   sec  3.75 MBytes  31.4 Mbits/sec    0    130 KBytes
> [  5]   5.01-6.01   sec  3.75 MBytes  31.5 Mbits/sec    0    130 KBytes
> [  5]   6.01-7.01   sec  3.75 MBytes  31.6 Mbits/sec    0    130 KBytes
> [  5]   7.01-7.75   sec  2.63 MBytes  29.5 Mbits/sec    0    130 KBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bitrate         Retr
> [  5]   0.00-7.75   sec  28.6 MBytes  31.0 Mbits/sec    0            sender
> [  5]   0.00-7.75   sec  0.00 Bytes  0.00 bits/sec                  receiver
> iperf3: interrupt - the client has terminated # iperf3 -c 10.188.139.1 -R Connecting to
> host 10.188.139.1, port 5201 Reverse mode, remote host 10.188.139.1 is sending [
> 5] local 10.188.140.2 port 45582 connected to 10.188.139.1 port 5201
> [ ID] Interval           Transfer     Bitrate
> [  5]   0.00-1.03   sec  5.13 MBytes  41.9 Mbits/sec
> [  5]   1.03-2.03   sec  5.38 MBytes  44.8 Mbits/sec
> [  5]   2.03-3.02   sec  5.38 MBytes  45.6 Mbits/sec
> [  5]   3.02-4.02   sec  5.38 MBytes  45.2 Mbits/sec
> [  5]   4.02-5.01   sec  5.38 MBytes  45.4 Mbits/sec
> [  5]   5.01-5.30   sec  1.50 MBytes  43.2 Mbits/sec
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bitrate
> [  5]   0.00-5.30   sec  0.00 Bytes  0.00 bits/sec                  sender
> [  5]   0.00-5.30   sec  28.1 MBytes  44.5 Mbits/sec                  receiver
> iperf3: interrupt - the client has terminated
>
> I had never seen a device able to fully stablish communication except for replying to
> MAC requests, so I'm not sure what's happening here.
>
>
> On the other hand, and since I don't know how to debug this ARP issue, I went back
> to see if I could diagnose what's happening in DMA Engine mode, so I peeked at the
> code and I saw an asymmetry between RX and TX, which sounded good given that
> in dmaengine mode TX works perfectly (or so it seems) and RX is heavily buffered.
> This asymmetry lies precisely on the number of SG blocks and number of skb
> buffers.
>
> Both bd_nums are defined in the same way:
>         lp->rx_bd_num = RX_BD_NUM_DEFAULT; // = 1024
>         lp->tx_bd_num = TX_BD_NUM_DEFAULT; // = 128
>
>
> But the skb ring size is defined in a different fashion:
>         lp->tx_skb_ring = kcalloc(TX_BD_NUM_MAX, sizeof(*lp->tx_skb_ring), // =
> 4096
>                                   GFP_KERNEL);
>         ...
>         lp->rx_skb_ring = kcalloc(RX_BUF_NUM_DEFAULT, sizeof(*lp->rx_skb_ring),
> // = 128
>                                   GFP_KERNEL);
>
> So, for TX we allocate space for up to 4096 buffers but by default use 128.
> For RX we allocate space for 128 buffers but somehow are setting 1024 as the
> default bd number.
>
> The fact that RX_BD_NUM_DEFAULT is used nowhere else is also a signal that
> there was some mistake here, so I went and replaced all RX_BUF_NUM_DEFAULT
> occurances with RX_BD_NUM_DEFAULT, so that both TX and RX skb rings are
> declared and operated with using the same strategy:
>
>   sed -i '/^#define/!s#RX_BUF_NUM_DEFAULT#RX_BD_NUM_MAX#g'
> xilinx_axienet_main.c
>
> Doing this solved the buffering problem, although the system still doesn't reply to
> ARP requests, and when I tried to run an iperf3 test after manually adding arp tables,
> the kernel segfaulted (so I probably shouldn't have blindly 'sed' like that :)
>
> # iperf3 -c 10.188.139.1
> Connecting to host 10.188.139.1, port 5201 [  5] local 10.188.140.2 port 46356
> connected to 10.188.139.1 port 5201
> [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
> [  5]   0.00-1.01   sec   640 KBytes  5.18 Mbits/sec    3   84.8 KBykernel task_size
> exceed
> Oops: Exception in kernel mode, sig: 11
> CPU: 0 UID: 0 PID: 147 Comm: iperf3 Not tainted 6.13.8 #13  Registers dump:
> mode=8269B900  r1=00000000, r2=00000000, r3=00000000, r4=00000010
> r5=00000000, r6=000005F2, r7=FFFF7FFF, r8=00000000  r9=00000000,
> r10=00000000, r11=00000000, r12=CF5FF24C  r13=00000000, r14=C241AB70,
> r15=C0383EB8, r16=00000000  r17=C0383EC0, r18=000005F0, r19=C10124A0,
> r20=480F8520  r21=4831F960, r22=00000000, r23=00000000, r24=FFFFFFEA
> r25=C12BE0A8, r26=C12BE03C, r27=C12BE020, r28=00000122  r29=00000100,
> r30=000065A2, r31=C120F780, rPC=C0383EC0  msr=000046A2, ear=FFFFFFFA,
> esr=00000312, fsr=00000000 Kernel panic - not syncing: Aiee, killing interrupt
> handler!
> ---[ end Kernel panic - not syncing: Aiee, killing interrupt handler! ]--- tes
>
> I couldn't see what was wrong with new code, so I just went and replaced the
> RX_BD_NUM_DEFAULT value from 1024 down to 128, so it's now the same size as
> its TX counterpart, but the kernel segfaulted again when trying to measure
> throughput. Sadly, my kernel debugging abilities are not much stronger than this, so
> I'm stuck at this point but firmly believe there's something wrong here, although I
> can't see what it is.
>
> Any help will be greatly appreciated.
>
This doesn't looks like be the reason as driver doesn't uses lp->rx_bd_num  and lp->tx_bd_num to traverse skb ring in DMAengine flow. It uses axienet_get_rx_desc() and axienet_get_tx_desc() respectively, which uses same size as allocated.
Only difference between working and non-working I can see is increasing Rx skb ring size. But later you mentioned you tried to bring it down to 128, could you please confirm small size transfer still works?

FYI, basic ping and iperf both works for us in DMAengine flow for AXI ethernet 1G designs. We tested for full-duplex mode. But I can see half duplex in your case, could you please confirm if that is expected and correct?

>
> DTS NOTES:
> Using old DMA code inside xilinx_axienet_main.c requires removing "dmas" entry
> and add a reference to DMA device either via axistream-connected or by adding
> resources manually to the node. Referring to the node linked by axistream-
> connected requires a DMA node to exist, but its compatible string can't be xlnx,axi-
> dma-1.00.a, because then AXI DMA driver will lock onto it and axienet will complain
> about the device being busy. So my solution for this is to use a not compatible string.
> As such, with the following DTS I can establish TCP connections as long as ARP
> tables are manually entered:
>
>
> axi_ethernet_0_dma: dma@...00000 {
>         /* NOTE THE NOT */
>         compatible = "notxlnx,axi-dma-1.00.a";
>         #dma-cells = <1>;
>         reg = <0x41e00000 0x10000>;
>         interrupt-parent = <&microblaze_0_axi_intc>;
>         interrupts = <7 1 8 1>;
>         xlnx,addrwidth = <32>;  // Tamaño de dirección en bits
>         xlnx,datawidth = <32>;
>         xlnx,include-sg;
>         xlnx,sg-length-width = <16>;
>         xlnx,include-dre = <1>;
>         xlnx,axistream-connected = <1>;
>         xlnx,irq-delay = <1>;
>         dma-channels = <2>;
>         clock-names = "s_axi_lite_aclk", "m_axi_mm2s_aclk", "m_axi_s2mm_aclk",
> "m_axi_sg_aclk";
>         clocks = <&clk_bus_0>, <&clk_bus_0>, <&clk_bus_0>, <&clk_bus_0>;
>         dma-channel@...00000 {
>                 compatible = "xlnx,axi-dma-mm2s-channel";
>                 xlnx,include-dre = <1>;
>                 interrupts = <7 1>;
>                 xlnx,datawidth = <32>;
>         };
>         dma-channel@...00030 {
>                 compatible = "xlnx,axi-dma-s2mm-channel";
>                 xlnx,include-dre = <1>;
>                 interrupts = <8 1>;
>                 xlnx,datawidth = <32>;
>         };
> };
> axi_ethernet_eth: ethernet@...00000 {
>         compatible = "xlnx,axi-ethernet-1.00.a";
>         reg = <0x40c00000 0x40000>;
>         phy-handle = <&phy1>;
>         interrupt-parent = <&microblaze_0_axi_intc>;
>         interrupts = <3 0>;
>         xlnx,rxmem = <0x1000>;
>         max-speed = <100000>;
>         phy-mode = "mii";
>         xlnx,txcsum = <0x2>;
>         xlnx,rxcsum = <0x2>;
>         clock-names = "s_axi_lite_clk", "axis_clk", "ref_clk", "mgt_clk";
>         clocks = <&clk_bus_0>, <&clk_bus_0>, <&clk_bus_0>, <&clk_bus_0>;
>         axistream-connected = <&axi_ethernet_0_dma>;
>         dma-names = "tx_chan0", "rx_chan0";
>         mdio {
>                 #address-cells = <1>;
>                 #size-cells = <0>;
>                 phy1: ethernet-phy@1 {
>                         device_type = "ethernet-phy";
>                         reg = <1>;
>                 };
>         };
> };
>
> So this mode of working would definitely NOT need AXI DMA, and this hack with the
> compatible string should not be needed if the dependency with AXI DMA was
> removed.
>
> Best regards,
>
> --
> Álvaro G. M.