lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a4b27e11-a3fd-4df0-8dd4-60d1a4aec5a8@intel.com>
Date: Tue, 8 Jul 2025 17:50:23 -0700
From: Jacob Keller <jacob.e.keller@...el.com>
To: Jaroslav Pulchart <jaroslav.pulchart@...ddata.com>
CC: Maciej Fijalkowski <maciej.fijalkowski@...el.com>, Jakub Kicinski
	<kuba@...nel.org>, Przemek Kitszel <przemyslaw.kitszel@...el.com>,
	"intel-wired-lan@...ts.osuosl.org" <intel-wired-lan@...ts.osuosl.org>,
	"Damato, Joe" <jdamato@...tly.com>, "netdev@...r.kernel.org"
	<netdev@...r.kernel.org>, "Nguyen, Anthony L" <anthony.l.nguyen@...el.com>,
	Michal Swiatkowski <michal.swiatkowski@...ux.intel.com>, "Czapnik, Lukasz"
	<lukasz.czapnik@...el.com>, "Dumazet, Eric" <edumazet@...gle.com>, "Zaki,
 Ahmed" <ahmed.zaki@...el.com>, Martin Karsten <mkarsten@...terloo.ca>, "Igor
 Raits" <igor@...ddata.com>, Daniel Secik <daniel.secik@...ddata.com>, "Zdenek
 Pesek" <zdenek.pesek@...ddata.com>
Subject: Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE
 driver after upgrade to 6.13.y (regression in commit 492a044508ad)



On 7/7/2025 3:03 PM, Jacob Keller wrote:
> Bad news: my hypothesis was incorrect.
> 
> Good news: I can immediately see the problem if I set MTU to 9K and
> start an iperf3 session and just watch the count of allocations from
> ice_alloc_mapped_pages(). It goes up consistently, so I can quickly tell
> if a change is helping.
> 
> I ported the stats from i40e for tracking the page allocations, and I
> can see that we're allocating new pages despite not actually performing
> releases.
> 
> I don't yet have a good understanding of what causes this, and the logic
> in ice is pretty hard to track...
> 
> I'm going to try the page pool patches myself to see if this test bed
> triggers the same problems. Unfortunately I think I need someone else
> with more experience with the hotpath code to help figure out whats
> going wrong here...

I believe I have isolated this and figured out the issue: With 9K MTU,
sometimes the hardware posts a multi-buffer frame with an extra
descriptor that has a size of 0 bytes with no data in it. When this
happens, our logic for tracking buffers fails to free this buffer. We
then later overwrite the page because we failed to either free or re-use
the page, and our overwriting logic doesn't verify this.

I will have a fix with a more detailed description posted tomorrow.


Download attachment "OpenPGP_signature.asc" of type "application/pgp-signature" (237 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ