[<prev] [next>] [day] [month] [year] [list]
Message-ID: <b3eb99da-9293-43e8-a24d-f4082f747d6c@intel.com>
Date: Wed, 25 Jun 2025 16:03:19 +0200
From: Przemek Kitszel <przemyslaw.kitszel@...el.com>
To: Jaroslav Pulchart <jaroslav.pulchart@...ddata.com>,
"intel-wired-lan@...ts.osuosl.org" <intel-wired-lan@...ts.osuosl.org>
CC: "Keller, Jacob E" <jacob.e.keller@...el.com>, Jakub Kicinski
<kuba@...nel.org>, "Damato, Joe" <jdamato@...tly.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>, "Nguyen, Anthony L"
<anthony.l.nguyen@...el.com>, Michal Swiatkowski
<michal.swiatkowski@...ux.intel.com>, "Czapnik, Lukasz"
<lukasz.czapnik@...el.com>, "Dumazet, Eric" <edumazet@...gle.com>, "Zaki,
Ahmed" <ahmed.zaki@...el.com>, Martin Karsten <mkarsten@...terloo.ca>, "Igor
Raits" <igor@...ddata.com>, Daniel Secik <daniel.secik@...ddata.com>, "Zdenek
Pesek" <zdenek.pesek@...ddata.com>
Subject: Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE
driver after upgrade to 6.13.y (regression in commit 492a044508ad)
On 6/25/25 14:17, Jaroslav Pulchart wrote:
> Hello
>
> We are still facing the memory issue with Intel 810 NICs (even on latest
> 6.15.y).
>
> Our current stabilization and solution is to move everything to a new
> INTEL-FREE server and get rid of last Intel sights there (after Intel's
> CPU vulnerabilities fuckups NICs are next step).
>
> Any help welcomed,
> Jaroslav P.
>
>
Thank you for urging us, I can understand the frustration.
We have identified some (unrelated) memory leaks, will soon ship fixes.
And, as there were no clear issue with any commit/version you have
posted to be a culprit, there is a chance that our random findings could
help. Anyway going to zero kmemleak reports is good in itself, that is
a good start.
Will ask my VAL too to increase efforts in this area too.
Przemek
>
> st 4. 6. 2025 v 10:42 odesílatel Jaroslav Pulchart
> <jaroslav.pulchart@...ddata.com <mailto:jaroslav.pulchart@...ddata.com>>
> napsal:
>
> >
> > čt 17. 4. 2025 v 19:52 odesílatel Keller, Jacob E
> > <jacob.e.keller@...el.com <mailto:jacob.e.keller@...el.com>> napsal:
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Jakub Kicinski <kuba@...nel.org <mailto:kuba@...nel.org>>
> > > > Sent: Wednesday, April 16, 2025 5:13 PM
> > > > To: Keller, Jacob E <jacob.e.keller@...el.com
> <mailto:jacob.e.keller@...el.com>>
> > > > Cc: Jaroslav Pulchart <jaroslav.pulchart@...ddata.com
> <mailto:jaroslav.pulchart@...ddata.com>>; Kitszel, Przemyslaw
> > > > <przemyslaw.kitszel@...el.com
> <mailto:przemyslaw.kitszel@...el.com>>; Damato, Joe
> <jdamato@...tly.com <mailto:jdamato@...tly.com>>; intel-wired-
> > > > lan@...ts.osuosl.org <mailto:lan@...ts.osuosl.org>;
> netdev@...r.kernel.org <mailto:netdev@...r.kernel.org>; Nguyen,
> Anthony L
> > > > <anthony.l.nguyen@...el.com
> <mailto:anthony.l.nguyen@...el.com>>; Igor Raits <igor@...ddata.com
> <mailto:igor@...ddata.com>>; Daniel Secik
> > > > <daniel.secik@...ddata.com
> <mailto:daniel.secik@...ddata.com>>; Zdenek Pesek
> <zdenek.pesek@...ddata.com <mailto:zdenek.pesek@...ddata.com>>;
> > > > Dumazet, Eric <edumazet@...gle.com
> <mailto:edumazet@...gle.com>>; Martin Karsten
> > > > <mkarsten@...terloo.ca <mailto:mkarsten@...terloo.ca>>; Zaki,
> Ahmed <ahmed.zaki@...el.com <mailto:ahmed.zaki@...el.com>>; Czapnik,
> > > > Lukasz <lukasz.czapnik@...el.com
> <mailto:lukasz.czapnik@...el.com>>; Michal Swiatkowski
> > > > <michal.swiatkowski@...ux.intel.com
> <mailto:michal.swiatkowski@...ux.intel.com>>
> > > > Subject: Re: [Intel-wired-lan] Increased memory usage on NUMA
> nodes with ICE
> > > > driver after upgrade to 6.13.y (regression in commit
> 492a044508ad)
> > > >
> > > > On Wed, 16 Apr 2025 22:57:10 +0000 Keller, Jacob E wrote:
> > > > > > > And you're reverting just and exactly 492a044508ad13 ?
> > > > > > > The memory for persistent config is allocated in
> alloc_netdev_mqs()
> > > > > > > unconditionally. I'm lost as to how this commit could
> make any
> > > > > > > difference :(
> > > > > >
> > > > > > Yes, reverted the 492a044508ad13.
> > > > >
> > > > > Struct napi_config *is* 1056 bytes
> > > >
> > > > You're probably looking at 6.15-rcX kernels. Yes, the
> affinity mask
> > > > can be large depending on the kernel config. But report is
> for 6.13,
> > > > AFAIU. In 6.13 and 6.14 napi_config was tiny.
> > >
> > > Regardless, it should still be ~64KB even in that case which is
> a far cry from eating all available memory. Something else must be
> going on....
> > >
> > > Thanks,
> > > Jake
> >
> > Hello
> >
> > Some observation, this "problem" still exists with the latest 6.14.y
> > and there must be multiple issues, the memory utilization is slowly
> > going down, from 3GB to 100MB in 10-20days. at home NUMA nodes where
> > intel x810 NIC are (looks like some memory leak related to
> > networking).
> >
> > So without the revert the kawadX usage is observed asap like till
> > 1-2d, with revert of mentioned commit kswadX starts to consume
> > resources later like in ~10d-20d later. It is almost impossible
> to use
> > servers with Intel X810 cards (ice driver) with recent linux kernels.
> >
> > Were you able to reproduce the memory problems in your testbed?
> >
> > Best,
> > Jaroslav
>
> Hello
>
> I deployed linux 6.15.0 to our servers 7d ago and observed the
> behaviour of memory utilization of NUMA home nodes of Intel X810
> 1/ there is no need to revert the commit as before,
> 2/ the memory is continuously consumed (like memory leak),
> see attached "7d_memory_usage_per_numa_linux6.15.0.png" screenshot 8x
> numa nodes, (NUMA0 + NUMA1 are local for X810 nics). BTW: We do not
> see this memory utilization pattern on server s using Broadcom
> Netxtreme-E NICs
>
>
>
> --
> Jaroslav Pulchart
> Sr. Principal SW Engineer
> GoodData
Powered by blists - more mailing lists