lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20251110182008.71e0858b@xps15mal>
Date: Mon, 10 Nov 2025 18:20:08 +1000
From: Mal Haak <malcolm@...k.id.au>
To: linux-kernel@...r.kernel.org
Subject: Possible memory leak in 6.17.7

Hello,

I have found a memory leak in 6.17.7 but I am unsure how to track it
down effectively.

I am running a server that has a heavy read/write workload to a cephfs
file system. It is a VM. 

Over time it appears that the non-cache useage of kernel dynamic memory
increases. The kernel seems to think the pages are reclaimable however
nothing appears to trigger the reclaim. This leads to workloads getting
killed via oomkiller. 

smem -wp output:

Area                           Used      Cache   Noncache 
firmware/hardware             0.00%      0.00%      0.00% 
kernel image                  0.00%      0.00%      0.00% 
kernel dynamic memory        88.21%     36.25%     51.96% 
userspace memory              9.49%      0.15%      9.34% 
free memory                   2.30%      2.30%      0.00% 

free -h output:

       total  used   free   shared  buff/cache available 
Mem:   31Gi   3.6Gi  500Mi  4.0Mi   11Gi      27Gi 
Swap:  4.0Gi  179Mi  3.8Gi

Reverting to the previous LTS fixes the issue

smem -wp output:
Area                           Used      Cache   Noncache 
firmware/hardware             0.00%      0.00%      0.00% 
kernel image                  0.00%      0.00%      0.00% 
kernel dynamic memory        80.22%     79.32%      0.90% 
userspace memory             10.48%      0.20%     10.28% 
free memory                   9.30%      9.30%      0.00% 

I am unsure of the best way to track down the memory usage. I have
tried stopping the workload and unmounting the cephfs filesystem. As
well as removing the ceph and network related kernel modules. 

I assume some kind of tracing would be a way to find the culprit,
however I am unsure of the best way to do that. 

I can do a git bisect and am in the process of getting a test
reproducer made for that. 

But if there is an easier way to do it I would happily do that.

Just to note, slabtop looks normal and doesn't show the memory usage.

Thanks in advance

Mal

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ