linux-kernel - Re: Possible memory leak in 6.17.7

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5845dde.b3e3.19b2718bc89.Coremail.00107082@163.com>
Date: Tue, 16 Dec 2025 20:18:11 +0800 (CST)
From: "David Wang" <00107082@....com>
To: "Mal Haak" <malcolm@...k.id.au>
Cc: "Viacheslav Dubeyko" <Slava.Dubeyko@....com>,
	"ceph-devel@...r.kernel.org" <ceph-devel@...r.kernel.org>,
	"Xiubo Li" <xiubli@...hat.com>,
	"idryomov@...il.com" <idryomov@...il.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"surenb@...gle.com" <surenb@...gle.com>, dhowells@...hat.com,
	pc@...guebit.org, netfs@...ts.linux.dev
Subject: Re: Possible memory leak in 6.17.7


At 2025-12-16 19:55:27, "Mal Haak" <malcolm@...k.id.au> wrote:
>On Tue, 16 Dec 2025 17:09:18 +1000
>Mal Haak <malcolm@...k.id.au> wrote:
>
>> On Tue, 16 Dec 2025 15:00:43 +0800 (CST)
>> "David Wang" <00107082@....com> wrote:
>> 
>> > At 2025-12-16 09:26:47, "Mal Haak" <malcolm@...k.id.au> wrote:  
>> > >On Mon, 15 Dec 2025 19:42:56 +0000
>> > >Viacheslav Dubeyko <Slava.Dubeyko@....com> wrote:
>> > >    
>> > >> Hi Mal,
>> > >>     
>> > ><SNIP>     
>> > >> 
>> > >> Thanks a lot for reporting the issue. Finally, I can see the
>> > >> discussion in email list. :) Are you working on the patch with
>> > >> the fix? Should we wait for the fix or I need to start the issue
>> > >> reproduction and investigation? I am simply trying to avoid
>> > >> patches collision and, also, I have multiple other issues for
>> > >> the fix in CephFS kernel client. :)
>> > >> 
>> > >> Thanks,
>> > >> Slava.    
>> > >
>> > >Hello,
>> > >
>> > >Unfortunately creating a patch is just outside my comfort zone,
>> > >I've lived too long in Lustre land.    
>> > 
>> > Hi, just out of curiosity, have you narrowed down the caller of
>> > __filemap_get_folio causing the memory problem? Or do you have
>> > trouble applying the debug patch for memory allocation profiling?
>> > 
>> > David 
>> >   
>> Hi David,
>> 
>> I hadn't yet as I did test XFS and NFS to see if it replicated the
>> behaviour and it did not. 
>> 
>> But actually this could speed things up considerably. I will do that
>> now and see what I get.
>> 
>> Thanks
>> 
>> Mal
>> 
>I did just give it a blast. 
>
>Unfortunately it returned exactly what I expected, that is the calls
>are all coming from netfs.
>
>Which makes sense for cephfs. 
>
># sort -g /proc/allocinfo|tail|numfmt --to=iec
>         10M     2541 drivers/block/zram/zram_drv.c:1597 [zram]
>func:zram_meta_alloc 12M     3001 mm/execmem.c:41 func:execmem_vmalloc 
>         12M     3605 kernel/fork.c:311 func:alloc_thread_stack_node 
>         16M      992 mm/slub.c:3061 func:alloc_slab_page 
>         20M    35544 lib/xarray.c:378 func:xas_alloc 
>         31M     7704 mm/memory.c:1192 func:folio_prealloc 
>         69M    17562 mm/memory.c:1190 func:folio_prealloc 
>        104M     8212 mm/slub.c:3059 func:alloc_slab_page 
>        124M    30075 mm/readahead.c:189 func:ractl_alloc_folio 
>        2.6G   661392 fs/netfs/buffered_read.c:635 [netfs]
>func:netfs_write_begin 
>
>So, unfortunately it doesn't reveal the true source. But was worth a
>shot! So thanks again

Oh,  at least cephfs could be ruled out, right?

CC netfs folks then. :)


>
>Mal
>
>
>> > >
>> > >I've have been trying to narrow down a consistent reproducer that's
>> > >as fast as my production workload. (It crashes a 32GB VM in 2hrs)
>> > >And I haven't got it quite as fast. I think the dd workload is too
>> > >well behaved. 
>> > >
>> > >I can confirm the issue appeared in the major patch set that was
>> > >applied as part of the 6.15 kernel. So during the more complete
>> > >pages to folios switch and that nothing has changed in the bug
>> > >behaviour since then. I did have a look at all the diffs from 6.14
>> > >to 6.18 on addr.c and didn't see any changes post 6.15 that looked
>> > >like they would impact the bug behavior. 
>> > >
>> > >Again, I'm not super familiar with the CephFS code but to hazard a
>> > >guess, but I think that the web download workload triggers things
>> > >faster suggests that unaligned writes might make things worse. But
>> > >again, I'm not 100% sure. I can't find a reproducer as fast as
>> > >downloading a dataset. Rsync of lots and lots of tiny files is a
>> > >tad faster than the dd case.
>> > >
>> > >I did see some changes in ceph_check_page_before_write where the
>> > >previous code unlocked pages and then continued where as the
>> > >changed folio code just returns ENODATA and doesn't unlock
>> > >anything with most of the rest of the logic unchanged. This might
>> > >be perfectly fine, but in my, admittedly limited, reading of the
>> > >code I couldn't figure out where anything that was locked prior to
>> > >this being called would get unlocked like it did prior to the
>> > >change. Again, I could be miles off here and one of the bulk
>> > >reclaim/unlock passes that was added might be cleaning this up
>> > >correctly or some other functional change might take care of this,
>> > >but it looks to be potentially in the code path I'm excising and
>> > >it has had some unlock logic changed. 
>> > >
>> > >I've spent most of my time trying to find a solid quick reproducer.
>> > >Not that it takes long to start leaking folios, but I wanted
>> > >something that aggressively triggered it so a small vm would oom
>> > >quickly and when combined with crash_on_oom it could potentially be
>> > >used for regression testing by way of "did vm crash?".
>> > >
>> > >I'm not sure if it will super help, but I'll provide what details I
>> > >can about the actual workload that really sets it off. It's a
>> > >python based tool for downloading datasets. Datasets are split
>> > >into N chunks and the tool downloads them in parallel 100 at a
>> > >time until all N chunks are down. The compressed dataset is then
>> > >unpacked and reassembled for use with workloads. 
>> > >
>> > >This is replicating a common home folder usecase in HPC. CephFS is
>> > >very attractive for home folders due to it's "NFS-like" utility and
>> > >performance. And many tools use a similar method for fetching large
>> > >datasets. Tools are frequently written in python or go. 
>> > >
>> > >None of my customers have hit this yet, not have any enterprise
>> > >customers as none have moved to a new enough kernel yet due to slow
>> > >upgrade cycles. Even Proxmox have only just started testing on a
>> > >kernel version > 6.14. 
>> > >
>> > >I'm more than happy to help however I can with testing. I can run
>> > >instrumented kernels or test patches or whatever you need. I am
>> > >sorry I haven't been able to produce a super clean, fast reproducer
>> > >(my test cluster at home is all spinners and only 500TB usable).
>> > >But I figured I needed to get the word out asap as distros and soon
>> > >customers are going to be moving past 6.12-6.14 kernels as the 5-7
>> > >year update cycle marches on. Especially those wanting to take full
>> > >advantage of CacheFS and encryption functionality. 
>> > >
>> > >Again thanks for looking at this and do reach out if I can help in
>> > >anyway. I am in the ceph slack if it's faster to reach out that
>> > >way.
>> > >
>> > >Regards
>> > >
>> > >Mal Haak    
>>