[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20251206082336.6e04a1ac@xps15mal>
Date: Sat, 6 Dec 2025 08:23:36 +1000
From: Mal Haak <malcolm@...k.id.au>
To: linux-kernel@...r.kernel.org
Subject: Re: Possible memory leak in 6.17.7
I have a reproducer. It's slow but it works.
I kept rsync running for 2 days by moving 5TB of files.
smem -wp
Area Used Cache Noncache
firmware/hardware 0.00% 0.00% 0.00%
kernel image 0.00% 0.00% 0.00%
kernel dynamic memory 98.81% 1.69% 97.13%
userspace memory 0.08% 0.05% 0.03%
free memory 1.11% 1.11% 0.00%
[root@...neltest ~]# uname -a
Linux kerneltest 6.18.0-1-mainline #1 SMP PREEMPT_DYNAMIC Tue, 11
Nov 2025 00:02:22 +0000 x86_64 GNU/Linux
The issue is in 6.18.
On Thu, 20 Nov 2025 12:23:51 +1000
Mal Haak <malcolm@...k.id.au> wrote:
> On Mon, 10 Nov 2025 18:20:08 +1000
> Mal Haak <malcolm@...k.id.au> wrote:
>
> > Hello,
> >
> > I have found a memory leak in 6.17.7 but I am unsure how to track it
> > down effectively.
> >
> > I am running a server that has a heavy read/write workload to a
> > cephfs file system. It is a VM.
> >
> > Over time it appears that the non-cache useage of kernel dynamic
> > memory increases. The kernel seems to think the pages are
> > reclaimable however nothing appears to trigger the reclaim. This
> > leads to workloads getting killed via oomkiller.
> >
> > smem -wp output:
> >
> > Area Used Cache Noncache
> > firmware/hardware 0.00% 0.00% 0.00%
> > kernel image 0.00% 0.00% 0.00%
> > kernel dynamic memory 88.21% 36.25% 51.96%
> > userspace memory 9.49% 0.15% 9.34%
> > free memory 2.30% 2.30% 0.00%
> >
> > free -h output:
> >
> > total used free shared buff/cache available
> > Mem: 31Gi 3.6Gi 500Mi 4.0Mi 11Gi 27Gi
> > Swap: 4.0Gi 179Mi 3.8Gi
> >
> > Reverting to the previous LTS fixes the issue
> >
> > smem -wp output:
> > Area Used Cache Noncache
> > firmware/hardware 0.00% 0.00% 0.00%
> > kernel image 0.00% 0.00% 0.00%
> > kernel dynamic memory 80.22% 79.32% 0.90%
> > userspace memory 10.48% 0.20% 10.28%
> > free memory 9.30% 9.30% 0.00%
> >
> I have more information. The leaking of kernel memory only starts once
> there is a lot of data in buffers/cache. And only once it's been in
> that state for several hours.
>
> Currently in my search for a reproducer I have found that
> downloading then seeding of multiple torrents of linux
> distribution ISO's will replicate the issue. But it only begins
> leaking at around the 6-9 hour mark.
>
> It does not appear to be dependant on cephfs; but due to it's use of
> sockets I believe this is making the situation worse.
>
> I cannot replicate it at all with the LTS kernel release but it does
> look like the current RC releases do have this issue.
>
> I was looking at doing a kernel build with CONFIG_DEBUG_KMEMLEAK
> enabled and will if it's thought this would find the issue. However as
> the memory usage is still somewhat tracked and obviously marked as
> reclaimable it feels more like something in the reclaim logic is
> getting broken.
>
> I do wonder if due to it only happening after ram is mostly consumed
> by cache, and even then only if it has been that way for hours, if the
> issue is memory fragmentation related.
>
> Regardless, some advice on how to narrow this down faster than a git
> bisect as 9hrs to even confirm replication of the issue makes git
> bisect painfully slow.
>
> Thanks in advance
>
> Mal Haak
>
Powered by blists - more mailing lists