[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2a9ba88e.3aa6.19b0b73dd4e.Coremail.00107082@163.com>
Date: Thu, 11 Dec 2025 11:28:21 +0800 (CST)
From: "David Wang" <00107082@....com>
To: "Mal Haak" <malcolm@...k.id.au>
Cc: linux-kernel@...r.kernel.org, surenb@...gle.com, xiubli@...hat.com,
idryomov@...il.com, ceph-devel@...r.kernel.org
Subject: RRe: Possible memory leak in 6.17.7
At 2025-12-10 21:43:18, "Mal Haak" <malcolm@...k.id.au> wrote:
>On Tue, 9 Dec 2025 12:40:21 +0800 (CST)
>"David Wang" <00107082@....com> wrote:
>
>> At 2025-12-09 07:08:31, "Mal Haak" <malcolm@...k.id.au> wrote:
>> >On Mon, 8 Dec 2025 19:08:29 +0800
>> >David Wang <00107082@....com> wrote:
>> >
>> >> On Mon, 10 Nov 2025 18:20:08 +1000
>> >> Mal Haak <malcolm@...k.id.au> wrote:
>> >> > Hello,
>> >> >
>> >> > I have found a memory leak in 6.17.7 but I am unsure how to
>> >> > track it down effectively.
>> >> >
>> >> >
>> >>
>> >> I think the `memory allocation profiling` feature can help.
>> >> https://docs.kernel.org/mm/allocation-profiling.html
>> >>
>> >> You would need to build a kernel with
>> >> CONFIG_MEM_ALLOC_PROFILING=y
>> >> CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y
>> >>
>> >> And check /proc/allocinfo for the suspicious allocations which take
>> >> more memory than expected.
>> >>
>> >> (I once caught a nvidia driver memory leak.)
>> >>
>> >>
>> >> FYI
>> >> David
>> >>
>> >
>> >Thank you for your suggestion. I have some results.
>> >
>> >Ran the rsync workload for about 9 hours. It started to look like it
>> >was happening.
>> ># smem -pw
>> >Area Used Cache Noncache
>> >firmware/hardware 0.00% 0.00% 0.00%
>> >kernel image 0.00% 0.00% 0.00%
>> >kernel dynamic memory 80.46% 65.80% 14.66%
>> >userspace memory 0.35% 0.16% 0.19%
>> >free memory 19.19% 19.19% 0.00%
>> ># sort -g /proc/allocinfo|tail|numfmt --to=iec
>> > 22M 5609 mm/memory.c:1190 func:folio_prealloc
>> > 23M 1932 fs/xfs/xfs_buf.c:226 [xfs]
>> >func:xfs_buf_alloc_backing_mem
>> > 24M 24135 fs/xfs/xfs_icache.c:97 [xfs]
>> > func:xfs_inode_alloc 27M 6693 mm/memory.c:1192
>> > func:folio_prealloc 58M 14784 mm/page_ext.c:271
>> > func:alloc_page_ext 258M 129 mm/khugepaged.c:1069
>> > func:alloc_charge_folio 430M 770788 lib/xarray.c:378
>> > func:xas_alloc 545M 36444 mm/slub.c:3059 func:alloc_slab_page
>> > 9.8G 2563617 mm/readahead.c:189 func:ractl_alloc_folio
>> > 20G 5164004 mm/filemap.c:2012 func:__filemap_get_folio
>> >
>> >
>> >So I stopped the workload and dropped caches to confirm.
>> >
>> ># echo 3 > /proc/sys/vm/drop_caches
>> ># smem -pw
>> >Area Used Cache Noncache
>> >firmware/hardware 0.00% 0.00% 0.00%
>> >kernel image 0.00% 0.00% 0.00%
>> >kernel dynamic memory 33.45% 0.09% 33.36%
>> >userspace memory 0.36% 0.16% 0.19%
>> >free memory 66.20% 66.20% 0.00%
>> ># sort -g /proc/allocinfo|tail|numfmt --to=iec
>> > 12M 2987 mm/execmem.c:41 func:execmem_vmalloc
>> > 12M 3 kernel/dma/pool.c:96 func:atomic_pool_expand
>> > 13M 751 mm/slub.c:3061 func:alloc_slab_page
>> > 16M 8 mm/khugepaged.c:1069 func:alloc_charge_folio
>> > 18M 4355 mm/memory.c:1190 func:folio_prealloc
>> > 24M 6119 mm/memory.c:1192 func:folio_prealloc
>> > 58M 14784 mm/page_ext.c:271 func:alloc_page_ext
>> > 61M 15448 mm/readahead.c:189 func:ractl_alloc_folio
>> > 79M 6726 mm/slub.c:3059 func:alloc_slab_page
>> > 11G 2674488 mm/filemap.c:2012 func:__filemap_get_folio
Maybe narrowing down the "Noncache" caller of __filemap_get_folio would help clarify things.
(It could be designed that way, and needs other route than dropping-cache to release the memory, just guess....)
If you want, you can modify code to split the accounting for __filemap_get_folio according to its callers.
Following is a draft patch: (based on v6.18)
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 09b581c1d878..ba8c659a6ae3 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -753,7 +753,11 @@ static inline fgf_t fgf_set_order(size_t size)
}
void *filemap_get_entry(struct address_space *mapping, pgoff_t index);
-struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
+
+#define __filemap_get_folio(...) \
+ alloc_hooks(__filemap_get_folio_noprof(__VA_ARGS__))
+
+struct folio *__filemap_get_folio_noprof(struct address_space *mapping, pgoff_t index,
fgf_t fgp_flags, gfp_t gfp);
struct page *pagecache_get_page(struct address_space *mapping, pgoff_t index,
fgf_t fgp_flags, gfp_t gfp);
diff --git a/mm/filemap.c b/mm/filemap.c
index 024b71da5224..e1c1c26d7cb3 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1938,7 +1938,7 @@ void *filemap_get_entry(struct address_space *mapping, pgoff_t index)
*
* Return: The found folio or an ERR_PTR() otherwise.
*/
-struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
+struct folio *__filemap_get_folio_noprof(struct address_space *mapping, pgoff_t index,
fgf_t fgp_flags, gfp_t gfp)
{
struct folio *folio;
@@ -2009,7 +2009,7 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
err = -ENOMEM;
if (order > min_order)
alloc_gfp |= __GFP_NORETRY | __GFP_NOWARN;
- folio = filemap_alloc_folio(alloc_gfp, order);
+ folio = filemap_alloc_folio_noprof(alloc_gfp, order);
if (!folio)
continue;
@@ -2056,7 +2056,7 @@ struct folio *__filemap_get_folio(struct address_space *mapping, pgoff_t index,
folio_clear_dropbehind(folio);
return folio;
}
-EXPORT_SYMBOL(__filemap_get_folio);
+EXPORT_SYMBOL(__filemap_get_folio_noprof);
static inline struct folio *find_get_entry(struct xa_state *xas, pgoff_t max,
xa_mark_t mark)
FYI
David
>> >
>> >So if I'm reading this correctly something is causing folios collect
>> >and not be able to be freed?
>>
>> CC cephfs, maybe someone could have an easy reading out of those
>> folio usage
>>
>>
>> >
>> >Also it's clear that some of the folio's are counting as cache and
>> >some aren't.
>> >
>> >Like I said 6.17 and 6.18 both have the issue. 6.12 does not. I'm now
>> >going to manually walk through previous kernel releases and find
>> >where it first starts happening purely because I'm having issues
>> >building earlier kernels due to rust stuff and other python
>> >incompatibilities making doing a git-bisect a bit fun.
>> >
>> >I'll do it the packages way until I get closer, then solve the build
>> >issues.
>> >
>> >Thanks,
>> >Mal
>> >
>Thanks David.
>
>I've contacted the ceph developers as well.
>
>There was a suggestion it was due to the change from, to quote:
>folio.free() to folio.put() or something like this.
>
>The change happened around 6.14/6.15
>
>I've found an easier reproducer.
>
>There has been a suggestion that perhaps the ceph team might not fix
>this as "you can just reboot before the machine becomes unstable" and
>"Nobody else has encountered this bug"
>
>I'll leave that to other people to make a call on but I'd assume the
>lack of reports is due to the fact that most stable distros are still
>on a a far too early kernel and/or are using the fuse driver with k8s.
>
>Anyway, thanks for your assistance.
Powered by blists - more mailing lists