lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <85c2fea0-686f-435a-a539-81491a316e46@kernel.org>
Date: Tue, 26 Aug 2025 10:35:30 +0200
From: Jesper Dangaard Brouer <hawk@...nel.org>
To: Jacob Keller <jacob.e.keller@...el.com>,
 Michal Kubiak <michal.kubiak@...el.com>,
 Anthony Nguyen <anthony.l.nguyen@...el.com>,
 Intel Wired LAN <intel-wired-lan@...ts.osuosl.org>, netdev@...r.kernel.org
Cc: Christoph Petrausch <christoph.petrausch@...pl.com>,
 Jaroslav Pulchart <jaroslav.pulchart@...ddata.com>,
 kernel-team <kernel-team@...udflare.com>
Subject: Re: [PATCH iwl-net v2] ice: fix Rx page leak on multi-buffer frames



On 26/08/2025 01.00, Jacob Keller wrote:
> XDP_DROP performance has been tested for this version, thanks to work from
> Michal Kubiak. The results are quite promising, with 3 versions being
> compared:
> 
> * baseline net-next tree
> * v1 applied
> * v2 applied
> 
> Michal said:
> 
>    I run the XDP_DROP performance comparison tests on my setup in the way I
>    usually do. I didn't have the pktgen configured on my link partner, but I
>    used 6 instances of the xdpsock running in Tx-only mode to generate
>    high-bandwith traffic. Also, I tried to replicate the conditions according
>    to Jesper's description, making sure that all the traffic is directed to a
>    single Rx queue and one CPU is 100% loaded.
> 

Thank you for replicating the test setup.  Using xdpsock as a traffic
generator is fine, as long as we make sure that the generator TX speeds
exceeds the Device Under Test RX XDP_DROP speed.  It is also important
for the test that packets hits a single RX queue and we verify one CPU 
is 100% load, as you describe.

As a reminder the pktgen kernel module comes with ready-to-use sample 
shell-scripts[1].

  [1] https://elixir.bootlin.com/linux/v6.16.3/source/samples/pktgen

> The performance hit from v1 is replicated, and shown to be gone in v2, with
> our results showing even an increase compared to baseline instead of a
> drop. I've included the relative packet per second deltas compared against
> a baseline test with neither v1 or v2.
> 

Thanks for also replicating the performance hit from v1 as I did in [2].

To Michal: What CPU did you use?
  - I used CPU: AMD EPYC 9684X (with SRSO=IBPB)

One of the reasons that I saw a larger percentage drop is that this CPU
doesn't have DDIO/DCA, which deliver the packet to L3 cache (and a L2
cache-miss will obviously take less time than a full main memory cache-
miss). (Details: Newer AMD CPUs will get something called PCIe TLP
Processing Hints (TPH), which resembles DDIO).

Point is that I see some opportunities in driver to move some of the
prefetches earlier. But we want to make sure it benefits both CPU types,
and I can test on the AMD platform. (This CPU is a large part of our
fleet so it makes sense for us to optimize this).

> baseline to v1, no-touch:
>    -8,387,677 packets per second (17%) decrease.
> 
> baseline to v2, no-touch:
>    +4,057,000 packets per second (8%) increase!
> 
> baseline to v1, read data:
>    -411,709 packets per second (1%) decrease.
> 
> baseline to v2, read data:
>    +4,331,857 packets per second (11%) increase!

Thanks for providing these numbers.
I would also like to know the throughput PPS packet numbers before and
after, as this allows me to calculate the nanosec difference. Using
percentages are usually useful, but it can be misleading when dealing
with XDP_DROP speeds, because a small nanosec change will get
"magnified" too much.

> ---
> Changes in v2:
> - Only access shared info for fragmented frames
> - Link to v1: https://lore.kernel.org/netdev/20250815204205.1407768-4-anthony.l.nguyen@intel.com/

[2] 
https://lore.kernel.org/netdev/6e2cbea1-8c70-4bfa-9ce4-1d07b545a705@kernel.org/

> ---
>   drivers/net/ethernet/intel/ice/ice_txrx.h |  1 -
>   drivers/net/ethernet/intel/ice/ice_txrx.c | 80 +++++++++++++------------------
>   2 files changed, 34 insertions(+), 47 deletions(-)

Acked-by: Jesper Dangaard Brouer <hawk@...nel.org>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ