[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4a061a51-8a6c-42b8-9957-66073b4bc65f@intel.com>
Date: Tue, 15 Apr 2025 16:38:40 +0200
From: Przemek Kitszel <przemyslaw.kitszel@...el.com>
To: Jaroslav Pulchart <jaroslav.pulchart@...ddata.com>
CC: <jdamato@...tly.com>, <intel-wired-lan@...ts.osuosl.org>,
<netdev@...r.kernel.org>, Tony Nguyen <anthony.l.nguyen@...el.com>, "Igor
Raits" <igor@...ddata.com>, Daniel Secik <daniel.secik@...ddata.com>, "Zdenek
Pesek" <zdenek.pesek@...ddata.com>, Jakub Kicinski <kuba@...nel.org>, "Eric
Dumazet" <edumazet@...gle.com>, Martin Karsten <mkarsten@...terloo.ca>,
"Ahmed Zaki" <ahmed.zaki@...el.com>, "Czapnik, Lukasz"
<lukasz.czapnik@...el.com>, Michal Swiatkowski
<michal.swiatkowski@...ux.intel.com>
Subject: Re: Increased memory usage on NUMA nodes with ICE driver after
upgrade to 6.13.y (regression in commit 492a044508ad)
On 4/14/25 18:29, Jaroslav Pulchart wrote:
> Hello,
+CC to co-devs and reviewers of initial napi_config introduction
+CC Ahmed, who leverages napi_config for more stuff in 6.15
>
> While investigating increased memory usage after upgrading our
> host/hypervisor servers from Linux kernel 6.12.y to 6.13.y, I observed
> a regression in available memory per NUMA node. Our servers allocate
> 60GB of each NUMA node’s 64GB of RAM to HugePages for VMs, leaving 4GB
> for the host OS.
>
> After the upgrade, we noticed approximately 500MB less free RAM on
> NUMA nodes 0 and 2 compared to 6.12.y, even with no VMs running (just
> the host OS after reboot). These nodes host Intel 810-XXV NICs. Here's
> a snapshot of the NUMA stats on vanilla 6.13.y:
>
> NUMA nodes: 0 1 2 3 4 5 6 7 8
> 9 10 11 12 13 14 15
> HPFreeGiB: 60 60 60 60 60 60 60 60 60
> 60 60 60 60 60 60 60
> MemTotal: 64989 65470 65470 65470 65470 65470 65470 65453
> 65470 65470 65470 65470 65470 65470 65470 65462
> MemFree: 2793 3559 3150 3438 3616 3722 3520 3547 3547
> 3536 3506 3452 3440 3489 3607 3729
>
> We traced the issue to commit 492a044508ad13a490a24c66f311339bf891cb5f
> "ice: Add support for persistent NAPI config".
thank you for the report and bisection,
this commit is ice's opt-in into using persistent napi_config
I have checked the code, and there is nothing obvious to inflate memory
consumption in the driver/core in the touched parts. I have not yet
looked into how much memory is eaten by the hash array of now-kept
configs.
>
> We limit the number of channels on the NICs to match local NUMA cores
> or less if unused interface (from ridiculous 96 default), for example:
We will experiment with other defaults, looks like number of total CPUs,
instead of local NUMA cores, might be better here. And even if that
would resolve the issue, I would like to have a more direct fix for this
> ethtool -L em1 combined 6 # active port; from 96
> ethtool -L p3p2 combined 2 # unused port; from 96
>
> This typically aligns memory use with local CPUs and keeps NUMA-local
> memory usage within expected limits. However, starting with kernel
> 6.13.y and this commit, the high memory usage by the ICE driver
> persists regardless of reduced channel configuration.
As a workaround, you could try to do devlink reload (action
driver_reinit), that should flush all napi instances.
We will try to reproduce the issue locally and work on a fix.
>
> Reverting the commit restores expected memory availability on nodes 0
> and 2. Below are stats from 6.13.y with the commit reverted:
> NUMA nodes: 0 1 2 3 4 5 6 7 8
> 9 10 11 12 13 14 15
> HPFreeGiB: 60 60 60 60 60 60 60 60 60
> 60 60 60 60 60 60 60
> MemTotal: 64989 65470 65470 65470 65470 65470 65470 65453 65470
> 65470 65470 65470 65470 65470 65470 65462
> MemFree: 3208 3765 3668 3507 3811 3727 3812 3546 3676 3596 ...
>
> This brings nodes 0 and 2 back to ~3.5GB free RAM, similar to kernel
> 6.12.y, and avoids swap pressure and memory exhaustion when running
> services and VMs.
>
> I also do not see any practical benefit in persisting the channel
> memory allocation. After a fresh server reboot, channels are not
> explicitly configured, and the system will not automatically resize
> them back to a higher count unless manually set again. Therefore,
> retaining the previous memory footprint appears unnecessary and
> potentially harmful in memory-constrained environments
in this particular case there is indeed no benefit, it was designed
for keeping the config/stats for queues that were meaningfully used
it is rather clunky anyway
>
> Best regards,
> Jaroslav Pulchart
Powered by blists - more mailing lists