linux-kernel - RE: Possible memory leak in 6.17.7

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ec3b777ba176a6ca4738da8c62c030577a4e58eb.camel@ibm.com>
Date: Wed, 17 Dec 2025 01:56:52 +0000
From: Viacheslav Dubeyko <Slava.Dubeyko@....com>
To: "malcolm@...k.id.au" <malcolm@...k.id.au>,
        "00107082@....com"
	<00107082@....com>
CC: Xiubo Li <xiubli@...hat.com>, David Howells <dhowells@...hat.com>,
        "ceph-devel@...r.kernel.org" <ceph-devel@...r.kernel.org>,
        "surenb@...gle.com" <surenb@...gle.com>,
        "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>,
        "netfs@...ts.linux.dev"
	<netfs@...ts.linux.dev>,
        "pc@...guebit.org" <pc@...guebit.org>,
        "idryomov@...il.com" <idryomov@...il.com>
Subject: RE: Possible memory leak in 6.17.7

Hi Mal,

On Tue, 2025-12-16 at 20:42 +0800, David Wang wrote:
> At 2025-12-16 20:18:11, "David Wang" <00107082@....com> wrote:
> > 
> > 

<skipped>

> > > 
> > > 
> > > > > > 
> > > > > > I've have been trying to narrow down a consistent reproducer that's
> > > > > > as fast as my production workload. (It crashes a 32GB VM in 2hrs)
> > > > > > And I haven't got it quite as fast. I think the dd workload is too
> > > > > > well behaved. 
> > > > > > 
> > > > > > I can confirm the issue appeared in the major patch set that was
> > > > > > applied as part of the 6.15 kernel. So during the more complete
> > > > > > pages to folios switch and that nothing has changed in the bug
> > > > > > behaviour since then. I did have a look at all the diffs from 6.14
> > > > > > to 6.18 on addr.c and didn't see any changes post 6.15 that looked
> > > > > > like they would impact the bug behavior. 
> > > > > > 
> > > > > > Again, I'm not super familiar with the CephFS code but to hazard a
> > > > > > guess, but I think that the web download workload triggers things
> > > > > > faster suggests that unaligned writes might make things worse. But
> > > > > > again, I'm not 100% sure. I can't find a reproducer as fast as
> > > > > > downloading a dataset. Rsync of lots and lots of tiny files is a
> > > > > > tad faster than the dd case.
> > > > > > 
> > > > > > I did see some changes in ceph_check_page_before_write where the
> > > > > > previous code unlocked pages and then continued where as the
> > > > > > changed folio code just returns ENODATA and doesn't unlock
> > > > > > anything with most of the rest of the logic unchanged. This might
> > > > > > be perfectly fine, but in my, admittedly limited, reading of the
> > > > > > code I couldn't figure out where anything that was locked prior to
> > > > > > this being called would get unlocked like it did prior to the
> > > > > > change. Again, I could be miles off here and one of the bulk
> > > > > > reclaim/unlock passes that was added might be cleaning this up
> > > > > > correctly or some other functional change might take care of this,
> > > > > > but it looks to be potentially in the code path I'm excising and
> > > > > > it has had some unlock logic changed. 
> > > > > > 
> > > > > > I've spent most of my time trying to find a solid quick reproducer.
> > > > > > Not that it takes long to start leaking folios, but I wanted
> > > > > > something that aggressively triggered it so a small vm would oom
> > > > > > quickly and when combined with crash_on_oom it could potentially be
> > > > > > used for regression testing by way of "did vm crash?".
> > > > > > 
> > > > > > I'm not sure if it will super help, but I'll provide what details I
> > > > > > can about the actual workload that really sets it off. It's a
> > > > > > python based tool for downloading datasets. Datasets are split
> > > > > > into N chunks and the tool downloads them in parallel 100 at a
> > > > > > time until all N chunks are down. The compressed dataset is then
> > > > > > unpacked and reassembled for use with workloads. 
> > > > > > 
> > > > > > This is replicating a common home folder usecase in HPC. CephFS is
> > > > > > very attractive for home folders due to it's "NFS-like" utility and
> > > > > > performance. And many tools use a similar method for fetching large
> > > > > > datasets. Tools are frequently written in python or go. 
> > > > > > 
> > > > > > None of my customers have hit this yet, not have any enterprise
> > > > > > customers as none have moved to a new enough kernel yet due to slow
> > > > > > upgrade cycles. Even Proxmox have only just started testing on a
> > > > > > kernel version > 6.14. 
> > > > > > 
> > > > > > I'm more than happy to help however I can with testing. I can run
> > > > > > instrumented kernels or test patches or whatever you need. I am
> > > > > > sorry I haven't been able to produce a super clean, fast reproducer
> > > > > > (my test cluster at home is all spinners and only 500TB usable).
> > > > > > But I figured I needed to get the word out asap as distros and soon
> > > > > > customers are going to be moving past 6.12-6.14 kernels as the 5-7
> > > > > > year update cycle marches on. Especially those wanting to take full
> > > > > > advantage of CacheFS and encryption functionality. 
> > > > > > 
> > > > > > Again thanks for looking at this and do reach out if I can help in
> > > > > > anyway. I am in the ceph slack if it's faster to reach out that
> > > > > > way.
> > > > > > 
> > > > 

Could you please add your CephFS kernel client's mount options into the ticket
[1]?

Thanks a lot,
Slava.

[1] https://tracker.ceph.com/issues/74156