[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAK8fFZ6KzyfswFE=qj6pz-18QZ16swdwyFfTf=4e_0+sPLyUcg@mail.gmail.com>
Date: Sat, 5 Jul 2025 09:01:27 +0200
From: Jaroslav Pulchart <jaroslav.pulchart@...ddata.com>
To: Michal Kubiak <michal.kubiak@...el.com>
Cc: Tony Nguyen <anthony.l.nguyen@...el.com>, 
	"Kitszel, Przemyslaw" <przemyslaw.kitszel@...el.com>, jdamato@...tly.com, 
	intel-wired-lan@...ts.osuosl.org, netdev@...r.kernel.org, 
	Igor Raits <igor@...ddata.com>, Daniel Secik <daniel.secik@...ddata.com>, 
	Zdenek Pesek <zdenek.pesek@...ddata.com>
Subject: Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE
 driver after upgrade to 6.13.y (regression in commit 492a044508ad)
> On Mon, Apr 14, 2025 at 06:29:01PM +0200, Jaroslav Pulchart wrote:
> > Hello,
> >
> > While investigating increased memory usage after upgrading our
> > host/hypervisor servers from Linux kernel 6.12.y to 6.13.y, I observed
> > a regression in available memory per NUMA node. Our servers allocate
> > 60GB of each NUMA node’s 64GB of RAM to HugePages for VMs, leaving 4GB
> > for the host OS.
> >
> > After the upgrade, we noticed approximately 500MB less free RAM on
> > NUMA nodes 0 and 2 compared to 6.12.y, even with no VMs running (just
> > the host OS after reboot). These nodes host Intel 810-XXV NICs. Here's
> > a snapshot of the NUMA stats on vanilla 6.13.y:
> >
> >      NUMA nodes:  0     1     2     3     4     5     6     7     8
> >  9    10    11    12    13    14    15
> >      HPFreeGiB:   60    60    60    60    60    60    60    60    60
> >  60   60    60    60    60    60    60
> >      MemTotal:    64989 65470 65470 65470 65470 65470 65470 65453
> > 65470 65470 65470 65470 65470 65470 65470 65462
> >      MemFree:     2793  3559  3150  3438  3616  3722  3520  3547  3547
> >  3536  3506  3452  3440  3489  3607  3729
> >
> > We traced the issue to commit 492a044508ad13a490a24c66f311339bf891cb5f
> > "ice: Add support for persistent NAPI config".
> >
> > We limit the number of channels on the NICs to match local NUMA cores
> > or less if unused interface (from ridiculous 96 default), for example:
> >    ethtool -L em1 combined 6       # active port; from 96
> >    ethtool -L p3p2 combined 2      # unused port; from 96
> >
> > This typically aligns memory use with local CPUs and keeps NUMA-local
> > memory usage within expected limits. However, starting with kernel
> > 6.13.y and this commit, the high memory usage by the ICE driver
> > persists regardless of reduced channel configuration.
> >
> > Reverting the commit restores expected memory availability on nodes 0
> > and 2. Below are stats from 6.13.y with the commit reverted:
> >     NUMA nodes:  0     1     2     3     4     5     6     7     8
> > 9    10    11    12    13    14    15
> >     HPFreeGiB:   60    60    60    60    60    60    60    60    60
> > 60   60    60    60    60    60    60
> >     MemTotal:    64989 65470 65470 65470 65470 65470 65470 65453 65470
> > 65470 65470 65470 65470 65470 65470 65462
> >     MemFree:     3208  3765  3668  3507  3811  3727  3812  3546  3676  3596 ...
> >
> > This brings nodes 0 and 2 back to ~3.5GB free RAM, similar to kernel
> > 6.12.y, and avoids swap pressure and memory exhaustion when running
> > services and VMs.
> >
> > I also do not see any practical benefit in persisting the channel
> > memory allocation. After a fresh server reboot, channels are not
> > explicitly configured, and the system will not automatically resize
> > them back to a higher count unless manually set again. Therefore,
> > retaining the previous memory footprint appears unnecessary and
> > potentially harmful in memory-constrained environments
> >
> > Best regards,
> > Jaroslav Pulchart
> >
>
>
> Hello Jaroslav,
>
> I have just sent a series for converting the Rx path of the ice driver
> to use the Page Pool.
> We suspect it may help for the memory consumption issue since it removes
> the problematic code and delegates some memory management to the generic
> code.
>
> Could you please give it a try and check if it helps for your issue.
> The link to the series: https://lore.kernel.org/intel-wired-lan/20250704161859.871152-1-michal.kubiak@intel.com/
I can try it, however I cannot apply the patch as-is @ 6.15.y:
$ git am ~/ice-convert-Rx-path-to-Page-Pool.patch
Applying: ice: remove legacy Rx and construct SKB
Applying: ice: drop page splitting and recycling
error: patch failed: drivers/net/ethernet/intel/ice/ice_txrx.h:480
error: drivers/net/ethernet/intel/ice/ice_txrx.h: patch does not apply
Patch failed at 0002 ice: drop page splitting and recycling
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
>
> Thanks,
> Michal
>
Powered by blists - more mailing lists
 
