netdev - Re: Need of advice for XDP sockets on top of the interfaces behind a Linux bonding device

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAJEV1ihSfRe3PK60UG1gWezOSMBG4thFcU4KGi+9VxgTfNb9Yg@mail.gmail.com>
Date: Fri, 8 Mar 2024 12:05:00 +0200
From: Pavel Vazharov <pavel@...e.net>
To: Maciej Fijalkowski <maciej.fijalkowski@...el.com>
Cc: Magnus Karlsson <magnus.karlsson@...il.com>, Toke Høiland-Jørgensen <toke@...nel.org>, 
	Jakub Kicinski <kuba@...nel.org>, netdev@...r.kernel.org
Subject: Re: Need of advice for XDP sockets on top of the interfaces behind a
 Linux bonding device

On Mon, Feb 19, 2024 at 4:56 PM Maciej Fijalkowski
<maciej.fijalkowski@...el.com> wrote:
>
> On Mon, Feb 19, 2024 at 03:45:24PM +0200, Pavel Vazharov wrote:
>
> [...]
>
> > > > We changed the setup and I did the tests with a single port, no
> > > > bonding involved.
> > > > The port was configured with 16 queues (and 16 XSK sockets bound to them).
> > > > I tested with about 100 Mbps of traffic to not break lots of users.
> > > > During the tests I observed the traffic on the real time graph on the
> > > > remote device port
> > > > connected to the server machine where the application was running in
> > > > L3 forward mode:
> > > > - with zero copy enabled the traffic to the server was about 100 Mbps
> > > > but the traffic
> > > > coming out of the server was about 50 Mbps (i.e. half of it).
> > > > - with no zero copy the traffic in both directions was the same - the
> > > > two graphs matched perfectly
> > > > Nothing else was changed during the both tests, only the ZC option.
> > > > Can I check some stats or something else for this testing scenario
> > > > which could be
> > > > used to reveal more info about the issue?
> > >
> > > FWIW I don't see this on my side. My guess would be that some of the
> > > queues stalled on ZC due to buggy enable/disable ring pair routines that I
> > > am (fingers crossed :)) fixing, or trying to fix in previous email. You
> > > could try something as simple as:
> > >
> > > $ watch -n 1 "ethtool -S eth_ixgbe | grep rx | grep bytes"
> > >
> > > and verify each of the queues that are supposed to receive traffic. Do the
> > > same thing with tx, similarly.
> > >
> > > >
> > > > > >
> > Thank you for the help.
> >
> > I tried the given patch on kernel 6.7.5.
> > The bonding issue, that I described in the above e-mails, seems fixed.
> > I can no longer reproduce the issue with the malformed LACP messages.
>
> Awesome! I'll send a fix to lists then.
>
> >
> > However, I tested again with traffic and the issue remains:
> > - when traffic is redirected to the machine and simply forwarded at L3
> > by our application only about 1/2 - 2/3 of it exits the machine
> > - disabling only the Zero Copy (and nothing else in the application)
> > fixes the issue
> > - another thing that I noticed is in the device stats - the Rx bytes
> > looks OK and the counters of every queue increase over the time (with
> > and without ZC)
> > ethtool -S eth4 | grep rx | grep bytes
> >      rx_bytes: 20061532582
> >      rx_bytes_nic: 27823942900
> >      rx_queue_0_bytes: 690230537
> >      rx_queue_1_bytes: 1051217950
> >      rx_queue_2_bytes: 1494877257
> >      rx_queue_3_bytes: 1989628734
> >      rx_queue_4_bytes: 894557655
> >      rx_queue_5_bytes: 1557310636
> >      rx_queue_6_bytes: 1459428265
> >      rx_queue_7_bytes: 1514067682
> >      rx_queue_8_bytes: 432567753
> >      rx_queue_9_bytes: 1251708768
> >      rx_queue_10_bytes: 1091840145
> >      rx_queue_11_bytes: 904127964
> >      rx_queue_12_bytes: 1241335871
> >      rx_queue_13_bytes: 2039939517
> >      rx_queue_14_bytes: 777819814
> >      rx_queue_15_bytes: 1670874034
> >
> > - without ZC the Tx bytes also look OK
> > ethtool -S eth4 | grep tx | grep bytes
> >      tx_bytes: 24411467399
> >      tx_bytes_nic: 29600497994
> >      tx_queue_0_bytes: 1525672312
> >      tx_queue_1_bytes: 1527162996
> >      tx_queue_2_bytes: 1529701681
> >      tx_queue_3_bytes: 1526220338
> >      tx_queue_4_bytes: 1524403501
> >      tx_queue_5_bytes: 1523242084
> >      tx_queue_6_bytes: 1523543868
> >      tx_queue_7_bytes: 1525376190
> >      tx_queue_8_bytes: 1526844278
> >      tx_queue_9_bytes: 1523938842
> >      tx_queue_10_bytes: 1522663364
> >      tx_queue_11_bytes: 1527292259
> >      tx_queue_12_bytes: 1525206246
> >      tx_queue_13_bytes: 1526670255
> >      tx_queue_14_bytes: 1523266153
> >      tx_queue_15_bytes: 1530263032
> >
> > - however with ZC enabled the Tx bytes stats don't look OK (some
> > queues are like doing nothing) - again it's exactly the same
> > application
> > The sum bytes increase much more than the sum of the per queue bytes.
> > ethtool -S eth4 | grep tx | grep bytes ; sleep 1 ; ethtool -S eth4 |
> > grep tx | grep bytes
> >      tx_bytes: 256022649
> >      tx_bytes_nic: 34961074621
> >      tx_queue_0_bytes: 372
> >      tx_queue_1_bytes: 0
> >      tx_queue_2_bytes: 0
> >      tx_queue_3_bytes: 0
> >      tx_queue_4_bytes: 9920
> >      tx_queue_5_bytes: 0
> >      tx_queue_6_bytes: 0
> >      tx_queue_7_bytes: 0
> >      tx_queue_8_bytes: 0
> >      tx_queue_9_bytes: 1364
> >      tx_queue_10_bytes: 0
> >      tx_queue_11_bytes: 0
> >      tx_queue_12_bytes: 1116
> >      tx_queue_13_bytes: 0
> >      tx_queue_14_bytes: 0
> >      tx_queue_15_bytes: 0
>
> Yeah here we are looking at Tx rings, not XDP rings that are used for ZC.
> XDP rings were acting like rings hidden from user, issue has been brought
> several times but currently I am not sure if we have some unified approach
> towards that. FWIW ixgbe currently doesn't expose them, sorry for
> misleading you.
>
> At this point nothing obvious comes to my mind but I can optimize Tx ZC
> path and then let's see where it will take us.
Thank you. I can help with some testing when/if needed.

>
> >
> >      tx_bytes: 257830280
> >      tx_bytes_nic: 34962912861
> >      tx_queue_0_bytes: 372
> >      tx_queue_1_bytes: 0
> >      tx_queue_2_bytes: 0
> >      tx_queue_3_bytes: 0
> >      tx_queue_4_bytes: 10044
> >      tx_queue_5_bytes: 0
> >      tx_queue_6_bytes: 0
> >      tx_queue_7_bytes: 0
> >      tx_queue_8_bytes: 0
> >      tx_queue_9_bytes: 1364
> >      tx_queue_10_bytes: 0
> >      tx_queue_11_bytes: 0
> >      tx_queue_12_bytes: 1116
> >      tx_queue_13_bytes: 0
> >      tx_queue_14_bytes: 0
> >      tx_queue_15_bytes: 0