lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <IA4PR11MB92788E4983AAAF86D1859119912AA@IA4PR11MB9278.namprd11.prod.outlook.com>
Date: Wed, 13 Aug 2025 13:43:59 +0000
From: "Singh, PriyaX" <priyax.singh@...el.com>
To: "Fijalkowski, Maciej" <maciej.fijalkowski@...el.com>, "Kitszel,
 Przemyslaw" <przemyslaw.kitszel@...el.com>, Intel Wired LAN
	<intel-wired-lan@...ts.osuosl.org>, "Lobakin, Aleksander"
	<aleksander.lobakin@...el.com>
CC: ": Joe Damato" <jdamato@...tly.com>, "Nguyen, Anthony L"
	<anthony.l.nguyen@...el.com>, "netdev@...r.kernel.org"
	<netdev@...r.kernel.org>, Christoph Petrausch
	<christoph.petrausch@...pl.com>, Jaroslav Pulchart
	<jaroslav.pulchart@...ddata.com>, "Keller, Jacob E"
	<jacob.e.keller@...el.com>, "Buvaneswaran, Sujai"
	<sujai.buvaneswaran@...el.com>
Subject: RE: [Intel-wired-lan] [PATCH iwl-net v2] ice: fix Rx page leak on
 multi-buffer frames

++ Sujai

> -----Original Message-----
> From: Singh, PriyaX
> Sent: Wednesday, August 13, 2025 7:10 PM
> To: Fijalkowski, Maciej <maciej.fijalkowski@...el.com>; Kitszel, Przemyslaw
> <przemyslaw.kitszel@...el.com>; Intel Wired LAN <intel-wired-
> lan@...ts.osuosl.org>; Lobakin, Aleksander <aleksander.lobakin@...el.com>
> Cc: : Joe Damato <jdamato@...tly.com>; Nguyen, Anthony L
> <anthony.l.nguyen@...el.com>; netdev@...r.kernel.org; Christoph
> Petrausch <christoph.petrausch@...pl.com>; Jaroslav Pulchart
> <jaroslav.pulchart@...ddata.com>; Keller, Jacob E
> <jacob.e.keller@...el.com>
> Subject: RE: [Intel-wired-lan] [PATCH iwl-net v2] ice: fix Rx page leak on
> multi-buffer frames
> 
> > -----Original Message-----
> > From: Intel-wired-lan <intel-wired-lan-bounces@...osl.org> On Behalf
> > Of Jacob Keller
> > Sent: Saturday, July 12, 2025 5:54 AM
> > To: Fijalkowski, Maciej <maciej.fijalkowski@...el.com>; Kitszel,
> > Przemyslaw <przemyslaw.kitszel@...el.com>; Intel Wired LAN
> > <intel-wired- lan@...ts.osuosl.org>; Lobakin, Aleksander
> > <aleksander.lobakin@...el.com>
> > Cc: Joe Damato <jdamato@...tly.com>; Nguyen, Anthony L
> > <anthony.l.nguyen@...el.com>; netdev@...r.kernel.org; Christoph
> > Petrausch <christoph.petrausch@...pl.com>; Jaroslav Pulchart
> > <jaroslav.pulchart@...ddata.com>; Keller, Jacob E
> > <jacob.e.keller@...el.com>
> > Subject: [Intel-wired-lan] [PATCH iwl-net v2] ice: fix Rx page leak on
> > multi- buffer frames
> >
> > The ice_put_rx_mbuf() function handles calling ice_put_rx_buf() for
> > each buffer in the current frame. This function was introduced as part
> > of handling multi-buffer XDP support in the ice driver.
> >
> > It works by iterating over the buffers from first_desc up to 1 plus
> > the total number of fragments in the frame, cached from before the XDP
> > program was executed.
> >
> > If the hardware posts a descriptor with a size of 0, the logic used in
> > ice_put_rx_mbuf() breaks. Such descriptors get skipped and don't get
> > added as fragments in ice_add_xdp_frag. Since the buffer isn't counted
> > as a fragment, we do not iterate over it in ice_put_rx_mbuf(), and
> > thus we don't call ice_put_rx_buf().
> >
> > Because we don't call ice_put_rx_buf(), we don't attempt to re-use the
> > page or free it. This leaves a stale page in the ring, as we don't
> > increment next_to_alloc.
> >
> > The ice_reuse_rx_page() assumes that the next_to_alloc has been
> > incremented properly, and that it always points to a buffer with a
> > NULL page. Since this function doesn't check, it will happily recycle
> > a page over the top of the next_to_alloc buffer, losing track of the old
> page.
> >
> > Note that this leak only occurs for multi-buffer frames. The
> > ice_put_rx_mbuf() function always handles at least one buffer, so a
> > single- buffer frame will always get handled correctly. It is not
> > clear precisely why the hardware hands us descriptors with a size of 0
> > sometimes, but it happens somewhat regularly with "jumbo frames" used
> by 9K MTU.
> >
> > To fix ice_put_rx_mbuf(), we need to make sure to call
> > ice_put_rx_buf() on all buffers between first_desc and next_to_clean.
> > Borrow the logic of a similar function in i40e used for this same
> > purpose. Use the same logic also in ice_get_pgcnts().
> >
> > Instead of iterating over just the number of fragments, use a loop
> > which iterates until the current index reaches to the next_to_clean
> > element just past the current frame. Check the current number of
> > fragments (post XDP program). For all buffers up 1 more than the
> > number of fragments, we'll update the pagecnt_bias. For any buffers past
> this, pagecnt_bias is left as-is.
> > This ensures that fragments released by the XDP program, as well as
> > any buffers with zero-size won't have their pagecnt_bias updated
> incorrectly.
> > Unlike i40e, the ice_put_rx_mbuf() function does call
> > ice_put_rx_buf() on the last buffer of the frame indicating end of packet.
> >
> > The xdp_xmit value only needs to be updated if an XDP program is run,
> > and only once per packet. Drop the xdp_xmit pointer argument from
> > ice_put_rx_mbuf(). Instead, set xdp_xmit in the ice_clean_rx_irq()
> > function directly. This avoids needing to pass the argument and avoids
> > an extra bit- wise OR for each buffer in the frame.
> >
> > Move the increment of the ntc local variable to ensure its updated
> > *before* all calls to ice_get_pgcnts() or ice_put_rx_mbuf(), as the
> > loop logic requires the index of the element just after the current frame.
> >
> > This has the advantage that we also no longer need to track or cache
> > the number of fragments in the rx_ring, which saves a few bytes in the
> ring.
> >
> > Cc: Christoph Petrausch <christoph.petrausch@...pl.com>
> > Reported-by: Jaroslav Pulchart <jaroslav.pulchart@...ddata.com>
> > Closes:
> >
> https://lore.kernel.org/netdev/CAK8fFZ4hY6GUJNENz3wY9jaYLZXGfpr7dnZxz
> > GMYoE44caRbgw@...l.gmail.com/
> > Fixes: 743bbd93cf29 ("ice: put Rx buffers after being done with
> > current
> > frame")
> > Signed-off-by: Jacob Keller <jacob.e.keller@...el.com>
> > ---
> > I've tested this in a setup with MTU 9000, using a combination of
> > iperf3 and wrk generated traffic.
> >
> > I tested this in a couple of ways. First, I check memory allocations
> > using
> > /proc/allocinfo:
> >
> >   awk '/ice_alloc_mapped_page/ { printf("%s %s\n", $1, $2) }'
> > /proc/allocinfo
> > | numfmt --to=iec
> >
> > Second, I ported some stats from i40e written by Joe Damato to track
> > the page allocation and busy counts. I consistently saw that the
> > allocate stat increased without the busy or waive stats increasing. I
> > also added a stat to track directly when we overwrote a page pointer
> > that was non-NULL in ice_reuse_rx_page(), and saw it increment
> consistently.
> >
> > With this fix, all of these indicators are fixed. I've tested both
> > 1500 byte and
> > 9000 byte MTU and no longer see the leak. With the counters I was able
> > to immediately see a leak within a few minutes of iperf3, so I am
> > confident that I've resolved the leak with this fix.
> >
> > I've now also tested with xdp-bench and confirm that XDP_TX and
> > XDP_REDIR work properly with the fix for updating xdp_xmit.
> > ---
> > Changes in v2:
> > - Fix XDP Tx/Redirect (Thanks Maciej!)
> > - Link to v1:
> > https://lore.kernel.org/r/20250709-jk-ice-fix-rx-mem-leak-v1-1-
> > cfdd7eeea905@...el.com
> > ---
> >  drivers/net/ethernet/intel/ice/ice_txrx.h |  1 -
> > drivers/net/ethernet/intel/ice/ice_txrx.c | 81
> > +++++++++++++------------------
> >  2 files changed, 33 insertions(+), 49 deletions(-)
> 
> Tested-by: Priya Singh <priyax.singh@...el.com>
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ