netdev - Re: Increased memory usage on NUMA nodes with ICE driver after upgrade to 6.13.y (regression in commit 492a044508ad)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4a061a51-8a6c-42b8-9957-66073b4bc65f@intel.com>
Date: Tue, 15 Apr 2025 16:38:40 +0200
From: Przemek Kitszel <przemyslaw.kitszel@...el.com>
To: Jaroslav Pulchart <jaroslav.pulchart@...ddata.com>
CC: <jdamato@...tly.com>, <intel-wired-lan@...ts.osuosl.org>,
	<netdev@...r.kernel.org>, Tony Nguyen <anthony.l.nguyen@...el.com>, "Igor
 Raits" <igor@...ddata.com>, Daniel Secik <daniel.secik@...ddata.com>, "Zdenek
 Pesek" <zdenek.pesek@...ddata.com>, Jakub Kicinski <kuba@...nel.org>, "Eric
 Dumazet" <edumazet@...gle.com>, Martin Karsten <mkarsten@...terloo.ca>,
	"Ahmed Zaki" <ahmed.zaki@...el.com>, "Czapnik, Lukasz"
	<lukasz.czapnik@...el.com>, Michal Swiatkowski
	<michal.swiatkowski@...ux.intel.com>
Subject: Re: Increased memory usage on NUMA nodes with ICE driver after
 upgrade to 6.13.y (regression in commit 492a044508ad)

On 4/14/25 18:29, Jaroslav Pulchart wrote:
> Hello,

+CC to co-devs and reviewers of initial napi_config introduction
+CC Ahmed, who leverages napi_config for more stuff in 6.15

> 
> While investigating increased memory usage after upgrading our
> host/hypervisor servers from Linux kernel 6.12.y to 6.13.y, I observed
> a regression in available memory per NUMA node. Our servers allocate
> 60GB of each NUMA node’s 64GB of RAM to HugePages for VMs, leaving 4GB
> for the host OS.
> 
> After the upgrade, we noticed approximately 500MB less free RAM on
> NUMA nodes 0 and 2 compared to 6.12.y, even with no VMs running (just
> the host OS after reboot). These nodes host Intel 810-XXV NICs. Here's
> a snapshot of the NUMA stats on vanilla 6.13.y:
> 
>       NUMA nodes:  0     1     2     3     4     5     6     7     8
>   9    10    11    12    13    14    15
>       HPFreeGiB:   60    60    60    60    60    60    60    60    60
>   60   60    60    60    60    60    60
>       MemTotal:    64989 65470 65470 65470 65470 65470 65470 65453
> 65470 65470 65470 65470 65470 65470 65470 65462
>       MemFree:     2793  3559  3150  3438  3616  3722  3520  3547  3547
>   3536  3506  3452  3440  3489  3607  3729
> 
> We traced the issue to commit 492a044508ad13a490a24c66f311339bf891cb5f
> "ice: Add support for persistent NAPI config".

thank you for the report and bisection,
this commit is ice's opt-in into using persistent napi_config

I have checked the code, and there is nothing obvious to inflate memory
consumption in the driver/core in the touched parts. I have not yet
looked into how much memory is eaten by the hash array of now-kept
configs.

> 
> We limit the number of channels on the NICs to match local NUMA cores
> or less if unused interface (from ridiculous 96 default), for example:

We will experiment with other defaults, looks like number of total CPUs,
instead of local NUMA cores, might be better here. And even if that
would resolve the issue, I would like to have a more direct fix for this

>     ethtool -L em1 combined 6       # active port; from 96
>     ethtool -L p3p2 combined 2      # unused port; from 96
> 
> This typically aligns memory use with local CPUs and keeps NUMA-local
> memory usage within expected limits. However, starting with kernel
> 6.13.y and this commit, the high memory usage by the ICE driver
> persists regardless of reduced channel configuration.

As a workaround, you could try to do devlink reload (action 
driver_reinit), that should flush all napi instances.

We will try to reproduce the issue locally and work on a fix.

> 
> Reverting the commit restores expected memory availability on nodes 0
> and 2. Below are stats from 6.13.y with the commit reverted:
>      NUMA nodes:  0     1     2     3     4     5     6     7     8
> 9    10    11    12    13    14    15
>      HPFreeGiB:   60    60    60    60    60    60    60    60    60
> 60   60    60    60    60    60    60
>      MemTotal:    64989 65470 65470 65470 65470 65470 65470 65453 65470
> 65470 65470 65470 65470 65470 65470 65462
>      MemFree:     3208  3765  3668  3507  3811  3727  3812  3546  3676  3596 ...
> 
> This brings nodes 0 and 2 back to ~3.5GB free RAM, similar to kernel
> 6.12.y, and avoids swap pressure and memory exhaustion when running
> services and VMs.
> 
> I also do not see any practical benefit in persisting the channel
> memory allocation. After a fresh server reboot, channels are not
> explicitly configured, and the system will not automatically resize
> them back to a higher count unless manually set again. Therefore,
> retaining the previous memory footprint appears unnecessary and
> potentially harmful in memory-constrained environments

in this particular case there is indeed no benefit, it was designed
for keeping the config/stats for queues that were meaningfully used
it is rather clunky anyway

> 
> Best regards,
> Jaroslav Pulchart