[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <dd8703a5-7597-493c-a5c7-73eac7ed67d5@intel.com>
Date: Mon, 18 Aug 2025 17:38:15 -0700
From: Jacob Keller <jacob.e.keller@...el.com>
To: Jesper Dangaard Brouer <hawk@...nel.org>, Tony Nguyen
<anthony.l.nguyen@...el.com>, <davem@...emloft.net>, <kuba@...nel.org>,
<pabeni@...hat.com>, <edumazet@...gle.com>, <andrew+netdev@...n.ch>,
<netdev@...r.kernel.org>
CC: <maciej.fijalkowski@...el.com>, <magnus.karlsson@...el.com>,
<ast@...nel.org>, <daniel@...earbox.net>, <john.fastabend@...il.com>,
<sdf@...ichev.me>, <bpf@...r.kernel.org>, <horms@...nel.org>,
<przemyslaw.kitszel@...el.com>, <aleksander.lobakin@...el.com>,
<jaroslav.pulchart@...ddata.com>, <jdamato@...tly.com>,
<christoph.petrausch@...pl.com>, Rinitha S <sx.rinitha@...el.com>, "Priya
Singh" <priyax.singh@...el.com>, Eelco Chaudron <echaudro@...hat.com>
Subject: Re: [PATCH net 3/6] ice: fix Rx page leak on multi-buffer frames
On 8/18/2025 4:05 AM, Jesper Dangaard Brouer wrote:
> On 15/08/2025 22.41, Tony Nguyen wrote:
>> This has the advantage that we also no longer need to track or cache the
>> number of fragments in the rx_ring, which saves a few bytes in the ring.
>>
>
> Have anyone tested the performance impact for XDP_DROP ?
> (with standard non-multi-buffer frames)
>
> Below code change will always touch cache-lines in shared_info area.
> Before it was guarded with a xdp_buff_has_frags() check.
>
I did some basic testing with XDP_DROP previously using the xdp-bench
tool, and do not recall notice an issue. I don't recall the actual
numbers now though, so I did some quick tests again.
without patch...
Client:
$ iperf3 -u -c 192.168.93.1 -t86400 -l1200 -P20 -b5G
...
[SUM] 10.00-10.33 sec 626 MBytes 16.0 Gbits/sec 546909
$ iperf3 -s -B 192.168.93.1%ens260f0
[SUM] 0.00-10.00 sec 17.7 GBytes 15.2 Gbits/sec 0.011 ms
9712/15888183 (0.061%) receiver
$ xdp-bench drop ens260f0
Summary 1,778,935 rx/s 0 err/s
Summary 2,041,087 rx/s 0 err/s
Summary 2,005,052 rx/s 0 err/s
Summary 1,918,967 rx/s 0 err/s
with patch...
Client:
$ iperf3 -u -c 192.168.93.1 -t86400 -l1200 -P20 -b5G
...
[SUM] 78.00-78.90 sec 2.01 GBytes 19.1 Gbits/sec 1801284
Server:
$ iperf3 -s -B 192.168.93.1%ens260f0
[SUM] 77.00-78.00 sec 2.14 GBytes 18.4 Gbits/sec 0.012 ms
9373/1921186 (0.49%)
xdp-bench:
$ xdp-bench drop ens260f0
Dropping packets on ens260f0 (ifindex 8; driver ice)
Summary 1,910,918 rx/s 0 err/s
Summary 1,866,562 rx/s 0 err/s
Summary 1,901,233 rx/s 0 err/s
Summary 1,859,854 rx/s 0 err/s
Summary 1,593,493 rx/s 0 err/s
Summary 1,891,426 rx/s 0 err/s
Summary 1,880,673 rx/s 0 err/s
Summary 1,866,043 rx/s 0 err/s
Summary 1,872,845 rx/s 0 err/s
I ran a few times and it seemed to waffle a bit around 15Gbit/sec to
20Gbit/sec, with throughput varying regardless of which patch applied. I
actually tended to see slightly higher numbers with this fix applied,
but it was not consistent and hard to measure.
without the patch:
Without xdp-bench running the XDP program, top showed a CPU usage of
740% and an ~86 idle score.
With xdp-bench running, the iperf cpu drops off the top listing and the
CPU idle score goes up to 99.9
with the patch:
The iperf3 CPU use seems to go up, but so does the throughput. It is
hard to get an isolated measure. I don't have an immediate setup for
fine tuned performance testing available to do anything more rigorous.
Personally, I think its overall in the noise, as I saw the same peak
performance and CPU usages with and without the patch.
I also tried testing TCP and also didn't see a significant difference
with or without the patch. Though, testing xdp-bench with TCP is not
that useful since the client stops transmitting once the packets are
dropped instead of handled.
$ iperf3 -c 192.168.93.1 -t86400 -l8000 -P5
Without patch:
[SUM] 24.00-25.00 sec 7.80 GBytes 67.0 Gbits/sec
With patch:
[SUM] 28.00-29.00 sec 7.85 GBytes 67.4 Gbits/sec
Again, it ranges from 60 to 68 Gbit/sec in both cases, though I think
the peak is slightly higher with the fix applied, sometimes I saw it
spike up to 70Gbit/sec but it mostly hovers around 67 Gbit/sec.
I'm sure theres a lot of factors impacting the performance here, but I
think there's not much evidence that its significantly different.
>> Cc: Christoph Petrausch <christoph.petrausch@...pl.com>
>> Reported-by: Jaroslav Pulchart <jaroslav.pulchart@...ddata.com>
>> Closes: https://lore.kernel.org/netdev/CAK8fFZ4hY6GUJNENz3wY9jaYLZXGfpr7dnZxzGMYoE44caRbgw@mail.gmail.com/
>> Fixes: 743bbd93cf29 ("ice: put Rx buffers after being done with current frame")
>> Signed-off-by: Jacob Keller <jacob.e.keller@...el.com>
>> Tested-by: Rinitha S <sx.rinitha@...el.com> (A Contingent worker at Intel)
>> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@...el.com>
>> Tested-by: Maciej Fijalkowski <maciej.fijalkowski@...el.com>
>> Tested-by: Priya Singh <priyax.singh@...el.com>
>> Signed-off-by: Tony Nguyen <anthony.l.nguyen@...el.com>
>> ---
>> drivers/net/ethernet/intel/ice/ice_txrx.c | 81 +++++++++--------------
>> drivers/net/ethernet/intel/ice/ice_txrx.h | 1 -
>> 2 files changed, 33 insertions(+), 49 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
>> index 29e0088ab6b2..93907ab2eac7 100644
>> --- a/drivers/net/ethernet/intel/ice/ice_txrx.c
>> +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
>> @@ -894,10 +894,6 @@ ice_add_xdp_frag(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
>> __skb_fill_page_desc_noacc(sinfo, sinfo->nr_frags++, rx_buf->page,
>> rx_buf->page_offset, size);
>> sinfo->xdp_frags_size += size;
>> - /* remember frag count before XDP prog execution; bpf_xdp_adjust_tail()
>> - * can pop off frags but driver has to handle it on its own
>> - */
>> - rx_ring->nr_frags = sinfo->nr_frags;
>>
>> if (page_is_pfmemalloc(rx_buf->page))
>> xdp_buff_set_frag_pfmemalloc(xdp);
>> @@ -968,20 +964,20 @@ ice_get_rx_buf(struct ice_rx_ring *rx_ring, const unsigned int size,
>> /**
>> * ice_get_pgcnts - grab page_count() for gathered fragments
>> * @rx_ring: Rx descriptor ring to store the page counts on
>> + * @ntc: the next to clean element (not included in this frame!)
>> *
>> * This function is intended to be called right before running XDP
>> * program so that the page recycling mechanism will be able to take
>> * a correct decision regarding underlying pages; this is done in such
>> * way as XDP program can change the refcount of page
>> */
>> -static void ice_get_pgcnts(struct ice_rx_ring *rx_ring)
>> +static void ice_get_pgcnts(struct ice_rx_ring *rx_ring, unsigned int ntc)
>> {
>> - u32 nr_frags = rx_ring->nr_frags + 1;
>> u32 idx = rx_ring->first_desc;
>> struct ice_rx_buf *rx_buf;
>> u32 cnt = rx_ring->count;
>>
>> - for (int i = 0; i < nr_frags; i++) {
>> + while (idx != ntc) {
>> rx_buf = &rx_ring->rx_buf[idx];
>> rx_buf->pgcnt = page_count(rx_buf->page);
>>
>> @@ -1154,62 +1150,48 @@ ice_put_rx_buf(struct ice_rx_ring *rx_ring, struct ice_rx_buf *rx_buf)
>> }
>>
>> /**
>> - * ice_put_rx_mbuf - ice_put_rx_buf() caller, for all frame frags
>> + * ice_put_rx_mbuf - ice_put_rx_buf() caller, for all buffers in frame
>> * @rx_ring: Rx ring with all the auxiliary data
>> * @xdp: XDP buffer carrying linear + frags part
>> - * @xdp_xmit: XDP_TX/XDP_REDIRECT verdict storage
>> - * @ntc: a current next_to_clean value to be stored at rx_ring
>> + * @ntc: the next to clean element (not included in this frame!)
>> * @verdict: return code from XDP program execution
>> *
>> - * Walk through gathered fragments and satisfy internal page
>> - * recycle mechanism; we take here an action related to verdict
>> - * returned by XDP program;
>> + * Called after XDP program is completed, or on error with verdict set to
>> + * ICE_XDP_CONSUMED.
>> + *
>> + * Walk through buffers from first_desc to the end of the frame, releasing
>> + * buffers and satisfying internal page recycle mechanism. The action depends
>> + * on verdict from XDP program.
>> */
>> static void ice_put_rx_mbuf(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
>> - u32 *xdp_xmit, u32 ntc, u32 verdict)
>> + u32 ntc, u32 verdict)
>> {
>> - u32 nr_frags = rx_ring->nr_frags + 1;
>> + u32 nr_frags = xdp_get_shared_info_from_buff(xdp)->nr_frags;
>
> Here we unconditionally access the skb_shared_info area.
>
>> u32 idx = rx_ring->first_desc;
>> u32 cnt = rx_ring->count;
>> - u32 post_xdp_frags = 1;
>> struct ice_rx_buf *buf;
>> - int i;
>> -
>> - if (unlikely(xdp_buff_has_frags(xdp)))
>
> Previously we only touch shared_info area if this is a multi-buff frame.
>
I'm not certain, but reading the helpers it might be correct to do
something like this:
if (unlikely(xdp_buff_has_frags(xdp)))
nr_frags = xdp_get_shared_info_from_buff(xdp)->nr_frags;
else
nr_frags = 1
either in the driver code or by adding a new xdp helper function.
I'm not sure its worth it though. We have pending work from our
development team to refactor ice to use page pool and switch to libeth
XDP helpers which eliminates all of this driver-specific logic.
I don't personally think its worth holding up this series and this
important memory leak fix for a minor potential code change that I can't
measure an obvious improvement on.
Thanks,
Jake
Download attachment "OpenPGP_signature.asc" of type "application/pgp-signature" (237 bytes)
Powered by blists - more mailing lists