netdev - Re: Need of advice for XDP sockets on top of the interfaces behind a Linux bonding device

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zc+aN4rYKZKu3vKx@boxer>
Date: Fri, 16 Feb 2024 18:24:07 +0100
From: Maciej Fijalkowski <maciej.fijalkowski@...el.com>
To: Pavel Vazharov <pavel@...e.net>
CC: Magnus Karlsson <magnus.karlsson@...il.com>, Toke
 Høiland-Jørgensen <toke@...nel.org>, Jakub Kicinski
	<kuba@...nel.org>, <netdev@...r.kernel.org>
Subject: Re: Need of advice for XDP sockets on top of the interfaces behind a
 Linux bonding device

> > > > >
> > > > > Back to the issue.
> > > > > I just want to say again that we are not binding the XDP sockets to
> > > > > the bonding device.
> > > > > We are binding the sockets to the queues of the physical interfaces
> > > > > "below" the bonding device.
> > > > > My further observation this time is that when the issue happens and
> > > > > the remote device reports
> > > > > the LACP error there is no incoming LACP traffic on the corresponding
> > > > > local port,
> > > > > as seen by the xdump.
> > > > > The tcpdump at the same time sees only outgoing LACP packets and
> > > > > nothing incoming.
> > > > > For example:
> > > > > Remote device
> > > > >                           Local Server
> > > > > TrunkName=Eth-Trunk20, PortName=XGigabitEthernet0/0/12 <---> eth0
> > > > > TrunkName=Eth-Trunk20, PortName=XGigabitEthernet0/0/13 <---> eth2
> > > > > TrunkName=Eth-Trunk20, PortName=XGigabitEthernet0/0/14 <---> eth4
> > > > > And when the remote device reports "received an abnormal LACPDU"
> > > > > for PortName=XGigabitEthernet0/0/14 I can see via xdpdump that there
> > > > > is no incoming LACP traffic
> > > >
> > > > Hey Pavel,
> > > >
> > > > can you also look at /proc/interrupts at eth4 and what ethtool -S shows
> > > > there?
> > > I reproduced the problem but this time the interface with the weird
> > > state was eth0.
> > > It's different every time and sometimes even two of the interfaces are
> > > in such a state.
> > > Here are the requested info while being in this state:
> > > ~# ethtool -S eth0 > /tmp/stats0.txt ; sleep 10 ; ethtool -S eth0 >
> > > /tmp/stats1.txt ; diff /tmp/stats0.txt /tmp/stats1.txt
> > > 6c6
> > > <      rx_pkts_nic: 81426
> > > ---
> > > >      rx_pkts_nic: 81436
> > > 8c8
> > > <      rx_bytes_nic: 10286521
> > > ---
> > > >      rx_bytes_nic: 10287801
> > > 17c17
> > > <      multicast: 72216
> > > ---
> > > >      multicast: 72226
> > > 48c48
> > > <      rx_no_dma_resources: 1109
> > > ---
> > > >      rx_no_dma_resources: 1119
> > >
> > > ~# cat /proc/interrupts | grep eth0 > /tmp/interrupts0.txt ; sleep 10
> > > ; cat /proc/interrupts | grep eth0 > /tmp/interrupts1.txt
> > > interrupts0: 430 3098 64 108199 108199 108199 108199 108199 108199
> > > 108199 108201 63 64 1865 108199  61
> > > interrupts1: 435 3103 69 117967 117967  117967 117967 117967  117967
> > > 117967 117969 68 69 1870  117967 66
> > >
> > > So, it seems that packets are coming on the interface but they don't
> > > reach to the XDP layer and deeper.
> > > rx_no_dma_resources - this counter seems to give clues about a possible issue?
> > >
> > > >
> > > > > on eth4 but there is incoming LACP traffic on eth0 and eth2.
> > > > > At the same time, according to the dmesg the kernel sees all of the
> > > > > interfaces as
> > > > > "link status definitely up, 10000 Mbps full duplex".
> > > > > The issue goes aways if I stop the application even without removing
> > > > > the XDP programs
> > > > > from the interfaces - the running xdpdump starts showing the incoming
> > > > > LACP traffic immediately.
> > > > > The issue also goes away if I do "ip link set down eth4 && ip link set up eth4".
> > > >
> > > > and the setup is what when doing the link flap? XDP progs are loaded to
> > > > each of the 3 interfaces of bond?
> > > Yes, the same XDP program is loaded on application startup on each one
> > > of the interfaces which are part of bond0 (eth0, eth2, eth4):
> > > # xdp-loader status
> > > CURRENT XDP PROGRAM STATUS:
> > >
> > > Interface        Prio  Program name      Mode     ID   Tag
> > >   Chain actions
> > > --------------------------------------------------------------------------------------
> > > lo                     <No XDP program loaded!>
> > > eth0                   xdp_dispatcher    native   1320 90f686eb86991928
> > >  =>              50     x3sp_splitter_func          1329
> > > 3b185187f1855c4c  XDP_PASS
> > > eth1                   <No XDP program loaded!>
> > > eth2                   xdp_dispatcher    native   1334 90f686eb86991928
> > >  =>              50     x3sp_splitter_func          1337
> > > 3b185187f1855c4c  XDP_PASS
> > > eth3                   <No XDP program loaded!>
> > > eth4                   xdp_dispatcher    native   1342 90f686eb86991928
> > >  =>              50     x3sp_splitter_func          1345
> > > 3b185187f1855c4c  XDP_PASS
> > > eth5                   <No XDP program loaded!>
> > > eth6                   <No XDP program loaded!>
> > > eth7                   <No XDP program loaded!>
> > > bond0                  <No XDP program loaded!>
> > > Each of these interfaces is setup to have 16 queues i.e. the application,
> > > through the DPDK machinery, opens 3x16 XSK sockets each bound to the
> > > corresponding queue of the corresponding interface.
> > > ~# ethtool -l eth0 # It's same for the other 2 devices
> > > Channel parameters for eth0:
> > > Pre-set maximums:
> > > RX:             n/a
> > > TX:             n/a
> > > Other:          1
> > > Combined:       48
> > > Current hardware settings:
> > > RX:             n/a
> > > TX:             n/a
> > > Other:          1
> > > Combined:       16
> > >
> > > >
> > > > > However, I'm not sure what happens with the bound XDP sockets in this case
> > > > > because I haven't tested further.
> > > >
> > > > can you also try to bind xsk sockets before attaching XDP progs?
> > > I looked into the DPDK code again.
> > > The DPDK framework provides callback hooks like eth_rx_queue_setup
> > > and each "driver" implements it as needed. Each Rx/Tx queue of the device is
> > > set up separately. The af_xdp driver currently does this for each Rx
> > > queue separately:
> > > 1. configures the umem for the queue
> > > 2. loads the XDP program on the corresponding interface, if not already loaded
> > >    (i.e. this happens only once per interface when its first queue is set up).
> > > 3. does xsk_socket__create which as far as I looked also internally binds the
> > > socket to the given queue
> > > 4. places the socket in the XSKS map of the XDP program via bpf_map_update_elem
> > >
> > > So, it seems to me that the change needed will be a bit more involved.
> > > I'm not sure if it'll be possible to hardcode, just for the test, the
> > > program loading and
> > > the placing of all XSK sockets in the map to happen when the setup of the last
> > > "queue" for the given interface is done. I need to think a bit more about this.
> > Changed the code of the DPDK af_xdp "driver" to create and bind all of
> > the XSK sockets
> > to the queues of the corresponding interface and after that, after the
> > initialization of the
> > last XSK socket, I added the logic for the attachment of the XDP
> > program to the interface
> > and the population of the XSK map with the created sockets.
> > The issue was still there but it was kind of harder to reproduce - it
> > happened once for 5
> > starts of the application.
> >
> > >
> > > >
> > > > >
> > > > > It seems to me that something racy happens when the interfaces go down
> > > > > and back up
> > > > > (visible in the dmesg) when the XDP sockets are bound to their queues.
> > > > > I mean, I'm not sure why the interfaces go down and up but setting
> > > > > only the XDP programs
> > > > > on the interfaces doesn't cause this behavior. So, I assume it's
> > > > > caused by the binding of the XDP sockets.
> > > >
> > > > hmm i'm lost here, above you said you got no incoming traffic on eth4 even
> > > > without xsk sockets being bound?
> > > Probably I've phrased something in a wrong way.
> > > The issue is not observed if I load the XDP program on all interfaces
> > > (eth0, eth2, eth4)
> > > with the xdp-loader:
> > > xdp-loader load --mode native <iface> <path-to-the-xdp-program>
> > > It's not observed probably because there are no interface down/up actions.
> > > I also modified the DPDK "driver" to not remove the XDP program on exit and thus
> > > when the application stops only the XSK sockets are closed but the
> > > program remains
> > > loaded at the interfaces. When I stop this version of the application
> > > while running the
> > > xdpdump at the same time I see that the traffic immediately appears in
> > > the xdpdump.
> > > Also, note that I basically trimmed the XDP program to simply contain
> > > the XSK map
> > > (BPF_MAP_TYPE_XSKMAP) and the function just does "return XDP_PASS;".
> > > I wanted to exclude every possibility for the XDP program to do something wrong.
> > > So, from the above it seems to me that the issue is triggered somehow by the XSK
> > > sockets usage.
> > >
> > > >
> > > > > It could be that the issue is not related to the XDP sockets but just
> > > > > to the down/up actions of the interfaces.
> > > > > On the other hand, I'm not sure why the issue is easily reproducible
> > > > > when the zero copy mode is enabled
> > > > > (4 out of 5 tests reproduced the issue).
> > > > > However, when the zero copy is disabled this issue doesn't happen
> > > > > (I tried 10 times in a row and it doesn't happen).
> > > >
> > > > any chances that you could rule out the bond of the picture of this issue?
> > > I'll need to talk to the network support guys because they manage the network
> > > devices and they'll need to change the LACP/Trunk setup of the above
> > > "remote device".
> > > I can't promise that they'll agree though.
> We changed the setup and I did the tests with a single port, no
> bonding involved.
> The port was configured with 16 queues (and 16 XSK sockets bound to them).
> I tested with about 100 Mbps of traffic to not break lots of users.
> During the tests I observed the traffic on the real time graph on the
> remote device port
> connected to the server machine where the application was running in
> L3 forward mode:
> - with zero copy enabled the traffic to the server was about 100 Mbps
> but the traffic
> coming out of the server was about 50 Mbps (i.e. half of it).
> - with no zero copy the traffic in both directions was the same - the
> two graphs matched perfectly
> Nothing else was changed during the both tests, only the ZC option.
> Can I check some stats or something else for this testing scenario
> which could be
> used to reveal more info about the issue?

FWIW I don't see this on my side. My guess would be that some of the
queues stalled on ZC due to buggy enable/disable ring pair routines that I
am (fingers crossed :)) fixing, or trying to fix in previous email. You
could try something as simple as:

$ watch -n 1 "ethtool -S eth_ixgbe | grep rx | grep bytes"

and verify each of the queues that are supposed to receive traffic. Do the
same thing with tx, similarly. 

> 
> > >