netdev - Re: Need of advice for XDP sockets on top of the interfaces behind a Linux bonding device

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZdNsJzWpho81ichG@boxer>
Date: Mon, 19 Feb 2024 15:56:39 +0100
From: Maciej Fijalkowski <maciej.fijalkowski@...el.com>
To: Pavel Vazharov <pavel@...e.net>
CC: Magnus Karlsson <magnus.karlsson@...il.com>, Toke
 Høiland-Jørgensen <toke@...nel.org>, Jakub Kicinski
	<kuba@...nel.org>, <netdev@...r.kernel.org>
Subject: Re: Need of advice for XDP sockets on top of the interfaces behind a
 Linux bonding device

On Mon, Feb 19, 2024 at 03:45:24PM +0200, Pavel Vazharov wrote:

[...]

> > > We changed the setup and I did the tests with a single port, no
> > > bonding involved.
> > > The port was configured with 16 queues (and 16 XSK sockets bound to them).
> > > I tested with about 100 Mbps of traffic to not break lots of users.
> > > During the tests I observed the traffic on the real time graph on the
> > > remote device port
> > > connected to the server machine where the application was running in
> > > L3 forward mode:
> > > - with zero copy enabled the traffic to the server was about 100 Mbps
> > > but the traffic
> > > coming out of the server was about 50 Mbps (i.e. half of it).
> > > - with no zero copy the traffic in both directions was the same - the
> > > two graphs matched perfectly
> > > Nothing else was changed during the both tests, only the ZC option.
> > > Can I check some stats or something else for this testing scenario
> > > which could be
> > > used to reveal more info about the issue?
> >
> > FWIW I don't see this on my side. My guess would be that some of the
> > queues stalled on ZC due to buggy enable/disable ring pair routines that I
> > am (fingers crossed :)) fixing, or trying to fix in previous email. You
> > could try something as simple as:
> >
> > $ watch -n 1 "ethtool -S eth_ixgbe | grep rx | grep bytes"
> >
> > and verify each of the queues that are supposed to receive traffic. Do the
> > same thing with tx, similarly.
> >
> > >
> > > > >
> Thank you for the help.
> 
> I tried the given patch on kernel 6.7.5.
> The bonding issue, that I described in the above e-mails, seems fixed.
> I can no longer reproduce the issue with the malformed LACP messages.

Awesome! I'll send a fix to lists then.

> 
> However, I tested again with traffic and the issue remains:
> - when traffic is redirected to the machine and simply forwarded at L3
> by our application only about 1/2 - 2/3 of it exits the machine
> - disabling only the Zero Copy (and nothing else in the application)
> fixes the issue
> - another thing that I noticed is in the device stats - the Rx bytes
> looks OK and the counters of every queue increase over the time (with
> and without ZC)
> ethtool -S eth4 | grep rx | grep bytes
>      rx_bytes: 20061532582
>      rx_bytes_nic: 27823942900
>      rx_queue_0_bytes: 690230537
>      rx_queue_1_bytes: 1051217950
>      rx_queue_2_bytes: 1494877257
>      rx_queue_3_bytes: 1989628734
>      rx_queue_4_bytes: 894557655
>      rx_queue_5_bytes: 1557310636
>      rx_queue_6_bytes: 1459428265
>      rx_queue_7_bytes: 1514067682
>      rx_queue_8_bytes: 432567753
>      rx_queue_9_bytes: 1251708768
>      rx_queue_10_bytes: 1091840145
>      rx_queue_11_bytes: 904127964
>      rx_queue_12_bytes: 1241335871
>      rx_queue_13_bytes: 2039939517
>      rx_queue_14_bytes: 777819814
>      rx_queue_15_bytes: 1670874034
> 
> - without ZC the Tx bytes also look OK
> ethtool -S eth4 | grep tx | grep bytes
>      tx_bytes: 24411467399
>      tx_bytes_nic: 29600497994
>      tx_queue_0_bytes: 1525672312
>      tx_queue_1_bytes: 1527162996
>      tx_queue_2_bytes: 1529701681
>      tx_queue_3_bytes: 1526220338
>      tx_queue_4_bytes: 1524403501
>      tx_queue_5_bytes: 1523242084
>      tx_queue_6_bytes: 1523543868
>      tx_queue_7_bytes: 1525376190
>      tx_queue_8_bytes: 1526844278
>      tx_queue_9_bytes: 1523938842
>      tx_queue_10_bytes: 1522663364
>      tx_queue_11_bytes: 1527292259
>      tx_queue_12_bytes: 1525206246
>      tx_queue_13_bytes: 1526670255
>      tx_queue_14_bytes: 1523266153
>      tx_queue_15_bytes: 1530263032
> 
> - however with ZC enabled the Tx bytes stats don't look OK (some
> queues are like doing nothing) - again it's exactly the same
> application
> The sum bytes increase much more than the sum of the per queue bytes.
> ethtool -S eth4 | grep tx | grep bytes ; sleep 1 ; ethtool -S eth4 |
> grep tx | grep bytes
>      tx_bytes: 256022649
>      tx_bytes_nic: 34961074621
>      tx_queue_0_bytes: 372
>      tx_queue_1_bytes: 0
>      tx_queue_2_bytes: 0
>      tx_queue_3_bytes: 0
>      tx_queue_4_bytes: 9920
>      tx_queue_5_bytes: 0
>      tx_queue_6_bytes: 0
>      tx_queue_7_bytes: 0
>      tx_queue_8_bytes: 0
>      tx_queue_9_bytes: 1364
>      tx_queue_10_bytes: 0
>      tx_queue_11_bytes: 0
>      tx_queue_12_bytes: 1116
>      tx_queue_13_bytes: 0
>      tx_queue_14_bytes: 0
>      tx_queue_15_bytes: 0

Yeah here we are looking at Tx rings, not XDP rings that are used for ZC.
XDP rings were acting like rings hidden from user, issue has been brought
several times but currently I am not sure if we have some unified approach
towards that. FWIW ixgbe currently doesn't expose them, sorry for
misleading you.

At this point nothing obvious comes to my mind but I can optimize Tx ZC
path and then let's see where it will take us.

> 
>      tx_bytes: 257830280
>      tx_bytes_nic: 34962912861
>      tx_queue_0_bytes: 372
>      tx_queue_1_bytes: 0
>      tx_queue_2_bytes: 0
>      tx_queue_3_bytes: 0
>      tx_queue_4_bytes: 10044
>      tx_queue_5_bytes: 0
>      tx_queue_6_bytes: 0
>      tx_queue_7_bytes: 0
>      tx_queue_8_bytes: 0
>      tx_queue_9_bytes: 1364
>      tx_queue_10_bytes: 0
>      tx_queue_11_bytes: 0
>      tx_queue_12_bytes: 1116
>      tx_queue_13_bytes: 0
>      tx_queue_14_bytes: 0
>      tx_queue_15_bytes: 0