[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <559a9953-cd51-42ce-b2a5-83bd185cf008@molgen.mpg.de>
Date: Mon, 14 Apr 2025 19:15:51 +0200
From: Paul Menzel <pmenzel@...gen.mpg.de>
To: Jaroslav Pulchart <jaroslav.pulchart@...ddata.com>,
Tony Nguyen <anthony.l.nguyen@...el.com>,
Przemyslaw Kitszel <przemyslaw.kitszel@...el.com>
Cc: jdamato@...tly.com, intel-wired-lan@...ts.osuosl.org,
netdev@...r.kernel.org, Igor Raits <igor@...ddata.com>,
Daniel Secik <daniel.secik@...ddata.com>,
Zdenek Pesek <zdenek.pesek@...ddata.com>, regressions@...ts.linux.dev
Subject: Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE
driver after upgrade to 6.13.y (regression in commit 492a044508ad)
#regzbot ^introduced: 492a044508ad13a490a24c66f311339bf891cb5f
Am 14.04.25 um 18:29 schrieb Jaroslav Pulchart:
> Hello,
>
> While investigating increased memory usage after upgrading our
> host/hypervisor servers from Linux kernel 6.12.y to 6.13.y, I observed
> a regression in available memory per NUMA node. Our servers allocate
> 60GB of each NUMA node’s 64GB of RAM to HugePages for VMs, leaving 4GB
> for the host OS.
>
> After the upgrade, we noticed approximately 500MB less free RAM on
> NUMA nodes 0 and 2 compared to 6.12.y, even with no VMs running (just
> the host OS after reboot). These nodes host Intel 810-XXV NICs. Here's
> a snapshot of the NUMA stats on vanilla 6.13.y:
>
> NUMA nodes: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
> HPFreeGiB: 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60
> MemTotal: 64989 65470 65470 65470 65470 65470 65470 65453 65470 65470 65470 65470 65470 65470 65470 65462
> MemFree: 2793 3559 3150 3438 3616 3722 3520 3547 3547 3536 3506 3452 3440 3489 3607 3729
>
> We traced the issue to commit 492a044508ad13a490a24c66f311339bf891cb5f
> "ice: Add support for persistent NAPI config".
>
> We limit the number of channels on the NICs to match local NUMA cores
> or less if unused interface (from ridiculous 96 default), for example:
> ethtool -L em1 combined 6 # active port; from 96
> ethtool -L p3p2 combined 2 # unused port; from 96
>
> This typically aligns memory use with local CPUs and keeps NUMA-local
> memory usage within expected limits. However, starting with kernel
> 6.13.y and this commit, the high memory usage by the ICE driver
> persists regardless of reduced channel configuration.
>
> Reverting the commit restores expected memory availability on nodes 0
> and 2. Below are stats from 6.13.y with the commit reverted:
> NUMA nodes: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
> HPFreeGiB: 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60 60
> MemTotal: 64989 65470 65470 65470 65470 65470 65470 65453 65470 65470 65470 65470 65470 65470 65470 65462
> MemFree: 3208 3765 3668 3507 3811 3727 3812 3546 3676 3596 ...
>
> This brings nodes 0 and 2 back to ~3.5GB free RAM, similar to kernel
> 6.12.y, and avoids swap pressure and memory exhaustion when running
> services and VMs.
>
> I also do not see any practical benefit in persisting the channel
> memory allocation. After a fresh server reboot, channels are not
> explicitly configured, and the system will not automatically resize
> them back to a higher count unless manually set again. Therefore,
> retaining the previous memory footprint appears unnecessary and
> potentially harmful in memory-constrained environments
>
> Best regards,
> Jaroslav Pulchart
Powered by blists - more mailing lists