linux-kernel - RRe: Possible memory leak in 6.17.7

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2a9ba88e.3aa6.19b0b73dd4e.Coremail.00107082@163.com>
Date: Thu, 11 Dec 2025 11:28:21 +0800 (CST)
From: "David Wang" <00107082@....com>
To: "Mal Haak" <malcolm@...k.id.au>
Cc: linux-kernel@...r.kernel.org, surenb@...gle.com, xiubli@...hat.com,
	idryomov@...il.com, ceph-devel@...r.kernel.org
Subject: RRe: Possible memory leak in 6.17.7



At 2025-12-10 21:43:18, "Mal Haak" <malcolm@...k.id.au> wrote:
>On Tue, 9 Dec 2025 12:40:21 +0800 (CST)
>"David Wang" <00107082@....com> wrote:
>
>> At 2025-12-09 07:08:31, "Mal Haak" <malcolm@...k.id.au> wrote:
>> >On Mon,  8 Dec 2025 19:08:29 +0800
>> >David Wang <00107082@....com> wrote:
>> >  
>> >> On Mon, 10 Nov 2025 18:20:08 +1000
>> >> Mal Haak <malcolm@...k.id.au> wrote:  
>> >> > Hello,
>> >> > 
>> >> > I have found a memory leak in 6.17.7 but I am unsure how to
>> >> > track it down effectively.
>> >> > 
>> >> >     
>> >> 
>> >> I think the `memory allocation profiling` feature can help.
>> >> https://docs.kernel.org/mm/allocation-profiling.html
>> >> 
>> >> You would need to build a kernel with 
>> >> CONFIG_MEM_ALLOC_PROFILING=y
>> >> CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y
>> >> 
>> >> And check /proc/allocinfo for the suspicious allocations which take
>> >> more memory than expected.
>> >> 
>> >> (I once caught a nvidia driver memory leak.)
>> >> 
>> >> 
>> >> FYI
>> >> David
>> >>   
>> >
>> >Thank you for your suggestion. I have some results.
>> >
>> >Ran the rsync workload for about 9 hours. It started to look like it
>> >was happening.
>> ># smem -pw
>> >Area                           Used      Cache   Noncache 
>> >firmware/hardware             0.00%      0.00%      0.00% 
>> >kernel image                  0.00%      0.00%      0.00% 
>> >kernel dynamic memory        80.46%     65.80%     14.66% 
>> >userspace memory              0.35%      0.16%      0.19% 
>> >free memory                  19.19%     19.19%      0.00% 
>> ># sort -g /proc/allocinfo|tail|numfmt --to=iec
>> >         22M     5609 mm/memory.c:1190 func:folio_prealloc 
>> >         23M     1932 fs/xfs/xfs_buf.c:226 [xfs]
>> >func:xfs_buf_alloc_backing_mem 
>> >         24M    24135 fs/xfs/xfs_icache.c:97 [xfs]
>> > func:xfs_inode_alloc 27M     6693 mm/memory.c:1192
>> > func:folio_prealloc 58M    14784 mm/page_ext.c:271
>> > func:alloc_page_ext 258M      129 mm/khugepaged.c:1069
>> > func:alloc_charge_folio 430M   770788 lib/xarray.c:378
>> > func:xas_alloc 545M    36444 mm/slub.c:3059 func:alloc_slab_page 
>> >        9.8G  2563617 mm/readahead.c:189 func:ractl_alloc_folio 
>> >         20G  5164004 mm/filemap.c:2012 func:__filemap_get_folio 
>> >
>> >
>> >So I stopped the workload and dropped caches to confirm.
>> >
>> ># echo 3 > /proc/sys/vm/drop_caches
>> ># smem -pw
>> >Area                           Used      Cache   Noncache 
>> >firmware/hardware             0.00%      0.00%      0.00% 
>> >kernel image                  0.00%      0.00%      0.00% 
>> >kernel dynamic memory        33.45%      0.09%     33.36% 
>> >userspace memory              0.36%      0.16%      0.19% 
>> >free memory                  66.20%     66.20%      0.00% 
>> ># sort -g /proc/allocinfo|tail|numfmt --to=iec
>> >         12M     2987 mm/execmem.c:41 func:execmem_vmalloc 
>> >         12M        3 kernel/dma/pool.c:96 func:atomic_pool_expand 
>> >         13M      751 mm/slub.c:3061 func:alloc_slab_page 
>> >         16M        8 mm/khugepaged.c:1069 func:alloc_charge_folio 
>> >         18M     4355 mm/memory.c:1190 func:folio_prealloc 
>> >         24M     6119 mm/memory.c:1192 func:folio_prealloc 
>> >         58M    14784 mm/page_ext.c:271 func:alloc_page_ext 
>> >         61M    15448 mm/readahead.c:189 func:ractl_alloc_folio 
>> >         79M     6726 mm/slub.c:3059 func:alloc_slab_page 
>> >         11G  2674488 mm/filemap.c:2012 func:__filemap_get_folio

Maybe narrowing down the "Noncache" caller of __filemap_get_folio would help clarify things.
(It could be designed that way, and  needs other route than dropping-cache to release the memory, just guess....)
If you want, you can modify code to split the accounting for __filemap_get_folio according to its callers.

Following is a draft patch: (based on v6.18)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 09b581c1d878..ba8c659a6ae3 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -753,7 +753,11 @@ static inline fgf_t fgf_set_order(size_t size)
 }
 
 void *filemap_get_entry(struct address_space *mapping, pgoff_t index);
-struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
+
+#define __filemap_get_folio(...)			\
+	alloc_hooks(__filemap_get_folio_noprof(__VA_ARGS__))
+
+struct folio *__filemap_get_folio_noprof(struct address_space *mapping, pgoff_t index,
 		fgf_t fgp_flags, gfp_t gfp);
 struct page *pagecache_get_page(struct address_space *mapping, pgoff_t index,
 		fgf_t fgp_flags, gfp_t gfp);
diff --git a/mm/filemap.c b/mm/filemap.c
index 024b71da5224..e1c1c26d7cb3 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1938,7 +1938,7 @@ void *filemap_get_entry(struct address_space *mapping, pgoff_t index)
  *
  * Return: The found folio or an ERR_PTR() otherwise.
  */
-struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
+struct folio *__filemap_get_folio_noprof(struct address_space *mapping, pgoff_t index,
 		fgf_t fgp_flags, gfp_t gfp)
 {
 	struct folio *folio;
@@ -2009,7 +2009,7 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
 			err = -ENOMEM;
 			if (order > min_order)
 				alloc_gfp |= __GFP_NORETRY | __GFP_NOWARN;
-			folio = filemap_alloc_folio(alloc_gfp, order);
+			folio = filemap_alloc_folio_noprof(alloc_gfp, order);
 			if (!folio)
 				continue;
 
@@ -2056,7 +2056,7 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
 		folio_clear_dropbehind(folio);
 	return folio;
 }
-EXPORT_SYMBOL(__filemap_get_folio);
+EXPORT_SYMBOL(__filemap_get_folio_noprof);
 
 static inline struct folio *find_get_entry(struct xa_state *xas, pgoff_t max,
 		xa_mark_t mark)




FYI
David

>> >
>> >So if I'm reading this correctly something is causing folios collect
>> >and not be able to be freed?  
>> 
>> CC cephfs, maybe someone could have an easy reading out of those
>> folio usage
>> 
>> 
>> >
>> >Also it's clear that some of the folio's are counting as cache and
>> >some aren't. 
>> >
>> >Like I said 6.17 and 6.18 both have the issue. 6.12 does not. I'm now
>> >going to manually walk through previous kernel releases and find
>> >where it first starts happening purely because I'm having issues
>> >building earlier kernels due to rust stuff and other python
>> >incompatibilities making doing a git-bisect a bit fun.
>> >
>> >I'll do it the packages way until I get closer, then solve the build
>> >issues. 
>> >
>> >Thanks,
>> >Mal
>> >  
>Thanks David.
>
>I've contacted the ceph developers as well. 
>
>There was a suggestion it was due to the change from, to quote:
>folio.free() to folio.put() or something like this.
>
>The change happened around 6.14/6.15
>
>I've found an easier reproducer. 
>
>There has been a suggestion that perhaps the ceph team might not fix
>this as "you can just reboot before the machine becomes unstable" and
>"Nobody else has encountered this bug"
>
>I'll leave that to other people to make a call on but I'd assume the
>lack of reports is due to the fact that most stable distros are still
>on a a far too early kernel and/or are using the fuse driver with k8s.
>
>Anyway, thanks for your assistance.