lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <559a9953-cd51-42ce-b2a5-83bd185cf008@molgen.mpg.de>
Date: Mon, 14 Apr 2025 19:15:51 +0200
From: Paul Menzel <pmenzel@...gen.mpg.de>
To: Jaroslav Pulchart <jaroslav.pulchart@...ddata.com>,
 Tony Nguyen <anthony.l.nguyen@...el.com>,
 Przemyslaw Kitszel <przemyslaw.kitszel@...el.com>
Cc: jdamato@...tly.com, intel-wired-lan@...ts.osuosl.org,
 netdev@...r.kernel.org, Igor Raits <igor@...ddata.com>,
 Daniel Secik <daniel.secik@...ddata.com>,
 Zdenek Pesek <zdenek.pesek@...ddata.com>, regressions@...ts.linux.dev
Subject: Re: [Intel-wired-lan] Increased memory usage on NUMA nodes with ICE
 driver after upgrade to 6.13.y (regression in commit 492a044508ad)

#regzbot ^introduced: 492a044508ad13a490a24c66f311339bf891cb5f

Am 14.04.25 um 18:29 schrieb Jaroslav Pulchart:
> Hello,
> 
> While investigating increased memory usage after upgrading our
> host/hypervisor servers from Linux kernel 6.12.y to 6.13.y, I observed
> a regression in available memory per NUMA node. Our servers allocate
> 60GB of each NUMA node’s 64GB of RAM to HugePages for VMs, leaving 4GB
> for the host OS.
> 
> After the upgrade, we noticed approximately 500MB less free RAM on
> NUMA nodes 0 and 2 compared to 6.12.y, even with no VMs running (just
> the host OS after reboot). These nodes host Intel 810-XXV NICs. Here's
> a snapshot of the NUMA stats on vanilla 6.13.y:
> 
>       NUMA nodes:  0     1     2     3     4     5     6     7     8     9    10    11    12    13    14    15
>       HPFreeGiB:   60    60    60    60    60    60    60    60    60    60   60    60    60    60    60    60
>       MemTotal:    64989 65470 65470 65470 65470 65470 65470 65453 65470 65470 65470 65470 65470 65470 65470 65462
>       MemFree:     2793  3559  3150  3438  3616  3722  3520  3547  3547  3536  3506  3452  3440  3489  3607  3729
> 
> We traced the issue to commit 492a044508ad13a490a24c66f311339bf891cb5f
> "ice: Add support for persistent NAPI config".
> 
> We limit the number of channels on the NICs to match local NUMA cores
> or less if unused interface (from ridiculous 96 default), for example:
>     ethtool -L em1 combined 6       # active port; from 96
>     ethtool -L p3p2 combined 2      # unused port; from 96
> 
> This typically aligns memory use with local CPUs and keeps NUMA-local
> memory usage within expected limits. However, starting with kernel
> 6.13.y and this commit, the high memory usage by the ICE driver
> persists regardless of reduced channel configuration.
> 
> Reverting the commit restores expected memory availability on nodes 0
> and 2. Below are stats from 6.13.y with the commit reverted:
>      NUMA nodes:  0     1     2     3     4     5     6     7     8     9    10    11    12    13    14    15
>      HPFreeGiB:   60    60    60    60    60    60    60    60    60    60   60    60    60    60    60    60
>      MemTotal:    64989 65470 65470 65470 65470 65470 65470 65453 65470 65470 65470 65470 65470 65470 65470 65462
>      MemFree:     3208  3765  3668  3507  3811  3727  3812  3546  3676  3596 ...
> 
> This brings nodes 0 and 2 back to ~3.5GB free RAM, similar to kernel
> 6.12.y, and avoids swap pressure and memory exhaustion when running
> services and VMs.
> 
> I also do not see any practical benefit in persisting the channel
> memory allocation. After a fresh server reboot, channels are not
> explicitly configured, and the system will not automatically resize
> them back to a higher count unless manually set again. Therefore,
> retaining the previous memory footprint appears unnecessary and
> potentially harmful in memory-constrained environments
> 
> Best regards,
> Jaroslav Pulchart

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ