linux-kernel - Re: Possible memory leak in 6.17.7

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251216170918.5f7848cc@xps15mal>
Date: Tue, 16 Dec 2025 17:09:18 +1000
From: Mal Haak <malcolm@...k.id.au>
To: "David Wang" <00107082@....com>
Cc: "Viacheslav Dubeyko" <Slava.Dubeyko@....com>,
 "ceph-devel@...r.kernel.org" <ceph-devel@...r.kernel.org>, "Xiubo Li"
 <xiubli@...hat.com>, "idryomov@...il.com" <idryomov@...il.com>,
 "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
 "surenb@...gle.com" <surenb@...gle.com>
Subject: Re: Possible memory leak in 6.17.7

On Tue, 16 Dec 2025 15:00:43 +0800 (CST)
"David Wang" <00107082@....com> wrote:

> At 2025-12-16 09:26:47, "Mal Haak" <malcolm@...k.id.au> wrote:
> >On Mon, 15 Dec 2025 19:42:56 +0000
> >Viacheslav Dubeyko <Slava.Dubeyko@....com> wrote:
> >  
> >> Hi Mal,
> >>   
> ><SNIP>   
> >> 
> >> Thanks a lot for reporting the issue. Finally, I can see the
> >> discussion in email list. :) Are you working on the patch with the
> >> fix? Should we wait for the fix or I need to start the issue
> >> reproduction and investigation? I am simply trying to avoid patches
> >> collision and, also, I have multiple other issues for the fix in
> >> CephFS kernel client. :)
> >> 
> >> Thanks,
> >> Slava.  
> >
> >Hello,
> >
> >Unfortunately creating a patch is just outside my comfort zone, I've
> >lived too long in Lustre land.  
> 
> Hi, just out of curiosity, have you narrowed down the caller of
> __filemap_get_folio causing the memory problem? Or do you have
> trouble applying the debug patch for memory allocation profiling?
> 
> David 
> 
Hi David,

I hadn't yet as I did test XFS and NFS to see if it replicated the
behaviour and it did not. 

But actually this could speed things up considerably. I will do that
now and see what I get.

Thanks

Mal

> >
> >I've have been trying to narrow down a consistent reproducer that's
> >as fast as my production workload. (It crashes a 32GB VM in 2hrs)
> >And I haven't got it quite as fast. I think the dd workload is too
> >well behaved. 
> >
> >I can confirm the issue appeared in the major patch set that was
> >applied as part of the 6.15 kernel. So during the more complete pages
> >to folios switch and that nothing has changed in the bug behaviour
> >since then. I did have a look at all the diffs from 6.14 to 6.18 on
> >addr.c and didn't see any changes post 6.15 that looked like they
> >would impact the bug behavior. 
> >
> >Again, I'm not super familiar with the CephFS code but to hazard a
> >guess, but I think that the web download workload triggers things
> >faster suggests that unaligned writes might make things worse. But
> >again, I'm not 100% sure. I can't find a reproducer as fast as
> >downloading a dataset. Rsync of lots and lots of tiny files is a tad
> >faster than the dd case.
> >
> >I did see some changes in ceph_check_page_before_write where the
> >previous code unlocked pages and then continued where as the changed
> >folio code just returns ENODATA and doesn't unlock anything with most
> >of the rest of the logic unchanged. This might be perfectly fine, but
> >in my, admittedly limited, reading of the code I couldn't figure out
> >where anything that was locked prior to this being called would get
> >unlocked like it did prior to the change. Again, I could be miles off
> >here and one of the bulk reclaim/unlock passes that was added might
> >be cleaning this up correctly or some other functional change might
> >take care of this, but it looks to be potentially in the code path
> >I'm excising and it has had some unlock logic changed. 
> >
> >I've spent most of my time trying to find a solid quick reproducer.
> >Not that it takes long to start leaking folios, but I wanted
> >something that aggressively triggered it so a small vm would oom
> >quickly and when combined with crash_on_oom it could potentially be
> >used for regression testing by way of "did vm crash?".
> >
> >I'm not sure if it will super help, but I'll provide what details I
> >can about the actual workload that really sets it off. It's a python
> >based tool for downloading datasets. Datasets are split into N
> >chunks and the tool downloads them in parallel 100 at a time until
> >all N chunks are down. The compressed dataset is then unpacked and
> >reassembled for use with workloads. 
> >
> >This is replicating a common home folder usecase in HPC. CephFS is
> >very attractive for home folders due to it's "NFS-like" utility and
> >performance. And many tools use a similar method for fetching large
> >datasets. Tools are frequently written in python or go. 
> >
> >None of my customers have hit this yet, not have any enterprise
> >customers as none have moved to a new enough kernel yet due to slow
> >upgrade cycles. Even Proxmox have only just started testing on a
> >kernel version > 6.14. 
> >
> >I'm more than happy to help however I can with testing. I can run
> >instrumented kernels or test patches or whatever you need. I am
> >sorry I haven't been able to produce a super clean, fast reproducer
> >(my test cluster at home is all spinners and only 500TB usable). But
> >I figured I needed to get the word out asap as distros and soon
> >customers are going to be moving past 6.12-6.14 kernels as the 5-7
> >year update cycle marches on. Especially those wanting to take full
> >advantage of CacheFS and encryption functionality. 
> >
> >Again thanks for looking at this and do reach out if I can help in
> >anyway. I am in the ceph slack if it's faster to reach out that way.
> >
> >Regards
> >
> >Mal Haak