linux-kernel - RE: RRe: Possible memory leak in 6.17.7

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8c8e8dc4d30a8ca37a57d7f29c5f29cdf7a904ee.camel@ibm.com>
Date: Mon, 15 Dec 2025 19:42:56 +0000
From: Viacheslav Dubeyko <Slava.Dubeyko@....com>
To: "malcolm@...k.id.au" <malcolm@...k.id.au>,
        "00107082@....com"
	<00107082@....com>
CC: "ceph-devel@...r.kernel.org" <ceph-devel@...r.kernel.org>,
        Xiubo Li
	<xiubli@...hat.com>,
        "idryomov@...il.com" <idryomov@...il.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "surenb@...gle.com" <surenb@...gle.com>
Subject: RE: RRe: Possible memory leak in 6.17.7

Hi Mal,

On Thu, 2025-12-11 at 14:23 +1000, Mal Haak wrote:
> On Thu, 11 Dec 2025 11:28:21 +0800 (CST)
> "David Wang" <00107082@....com> wrote:
> 
> > At 2025-12-10 21:43:18, "Mal Haak" <malcolm@...k.id.au> wrote:
> > > On Tue, 9 Dec 2025 12:40:21 +0800 (CST)
> > > "David Wang" <00107082@....com> wrote:
> > >  
> > > > At 2025-12-09 07:08:31, "Mal Haak" <malcolm@...k.id.au> wrote:  
> > > > > On Mon,  8 Dec 2025 19:08:29 +0800
> > > > > David Wang <00107082@....com> wrote:
> > > > >    
> > > > > > On Mon, 10 Nov 2025 18:20:08 +1000
> > > > > > Mal Haak <malcolm@...k.id.au> wrote:    
> > > > > > > Hello,
> > > > > > > 
> > > > > > > I have found a memory leak in 6.17.7 but I am unsure how to
> > > > > > > track it down effectively.
> > > > > > > 
> > > > > > >       
> > > > > > 
> > > > > > I think the `memory allocation profiling` feature can help.
> > > > > > https://docs.kernel.org/mm/allocation-profiling.html  
> > > > > > 
> > > > > > You would need to build a kernel with 
> > > > > > CONFIG_MEM_ALLOC_PROFILING=y
> > > > > > CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y
> > > > > > 
> > > > > > And check /proc/allocinfo for the suspicious allocations which
> > > > > > take more memory than expected.
> > > > > > 
> > > > > > (I once caught a nvidia driver memory leak.)
> > > > > > 
> > > > > > 
> > > > > > FYI
> > > > > > David
> > > > > >     
> > > > > 
> > > > > Thank you for your suggestion. I have some results.
> > > > > 
> > > > > Ran the rsync workload for about 9 hours. It started to look like
> > > > > it was happening.
> > > > > # smem -pw
> > > > > Area                           Used      Cache   Noncache 
> > > > > firmware/hardware             0.00%      0.00%      0.00% 
> > > > > kernel image                  0.00%      0.00%      0.00% 
> > > > > kernel dynamic memory        80.46%     65.80%     14.66% 
> > > > > userspace memory              0.35%      0.16%      0.19% 
> > > > > free memory                  19.19%     19.19%      0.00% 
> > > > > # sort -g /proc/allocinfo|tail|numfmt --to=iec
> > > > >         22M     5609 mm/memory.c:1190 func:folio_prealloc 
> > > > >         23M     1932 fs/xfs/xfs_buf.c:226 [xfs]
> > > > > func:xfs_buf_alloc_backing_mem 
> > > > >         24M    24135 fs/xfs/xfs_icache.c:97 [xfs]
> > > > > func:xfs_inode_alloc 27M     6693 mm/memory.c:1192
> > > > > func:folio_prealloc 58M    14784 mm/page_ext.c:271
> > > > > func:alloc_page_ext 258M      129 mm/khugepaged.c:1069
> > > > > func:alloc_charge_folio 430M   770788 lib/xarray.c:378
> > > > > func:xas_alloc 545M    36444 mm/slub.c:3059 func:alloc_slab_page 
> > > > >        9.8G  2563617 mm/readahead.c:189 func:ractl_alloc_folio 
> > > > >         20G  5164004 mm/filemap.c:2012 func:__filemap_get_folio 
> > > > > 
> > > > > 
> > > > > So I stopped the workload and dropped caches to confirm.
> > > > > 
> > > > > # echo 3 > /proc/sys/vm/drop_caches
> > > > > # smem -pw
> > > > > Area                           Used      Cache   Noncache 
> > > > > firmware/hardware             0.00%      0.00%      0.00% 
> > > > > kernel image                  0.00%      0.00%      0.00% 
> > > > > kernel dynamic memory        33.45%      0.09%     33.36% 
> > > > > userspace memory              0.36%      0.16%      0.19% 
> > > > > free memory                  66.20%     66.20%      0.00% 
> > > > > # sort -g /proc/allocinfo|tail|numfmt --to=iec
> > > > >         12M     2987 mm/execmem.c:41 func:execmem_vmalloc 
> > > > >         12M        3 kernel/dma/pool.c:96
> > > > > func:atomic_pool_expand 13M      751 mm/slub.c:3061
> > > > > func:alloc_slab_page 16M        8 mm/khugepaged.c:1069
> > > > > func:alloc_charge_folio 18M     4355 mm/memory.c:1190
> > > > > func:folio_prealloc 24M     6119 mm/memory.c:1192
> > > > > func:folio_prealloc 58M    14784 mm/page_ext.c:271
> > > > > func:alloc_page_ext 61M    15448 mm/readahead.c:189
> > > > > func:ractl_alloc_folio 79M     6726 mm/slub.c:3059
> > > > > func:alloc_slab_page 11G  2674488 mm/filemap.c:2012
> > > > > func:__filemap_get_folio  
> > 
> > Maybe narrowing down the "Noncache" caller of __filemap_get_folio
> > would help clarify things. (It could be designed that way, and  needs
> > other route than dropping-cache to release the memory, just
> > guess....) If you want, you can modify code to split the accounting
> > for __filemap_get_folio according to its callers.
> 
> 
> Thanks again, I'll add this patch in and see where I end up. 
> 
> The issue is nothing will cause the memory to be freed. Dropping caches
> doesn't work, memory pressure doesn't work, unmounting the filesystems
> doesn't work. Removing the cephfs and netfs kernel modules also doesn't
> work. 
> 
> This is why I feel it's a ref_count (or similar) issue. 
> 
> I've also found it seems to be a fixed amount leaked each time, per
> file. Simply doing lots of IO on one large file doesn't leak as fast as
> lots of "small" (greater than 10MB less than 100MB seems to be a sweet
> spot) 
> 
> Also, dropping caches while the workload is running actually amplifies
> the issue. So it very much feels like something is wrong in the reclaim
> code.
> 
> Anyway I'll get this patch applied and see where I end up. 
> 
> I now have crash dumps (after enabling crash_on_oom) so I'm going to
> try and see if I can find these structures and see what state they are
> in
> 
> 

Thanks a lot for reporting the issue. Finally, I can see the discussion in email
list. :) Are you working on the patch with the fix? Should we wait for the fix
or I need to start the issue reproduction and investigation? I am simply trying
to avoid patches collision and, also, I have multiple other issues for the fix
in CephFS kernel client. :)

Thanks,
Slava.