linux-kernel - Re: Possible memory leak in 6.17.7

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20251209090831.13c7a639@xps15mal>
Date: Tue, 9 Dec 2025 09:08:31 +1000
From: Mal Haak <malcolm@...k.id.au>
To: linux-kernel@...r.kernel.org, surenb@...gle.com, David Wang
 <00107082@....com>
Subject: Re: Possible memory leak in 6.17.7

On Mon,  8 Dec 2025 19:08:29 +0800
David Wang <00107082@....com> wrote:

> On Mon, 10 Nov 2025 18:20:08 +1000
> Mal Haak <malcolm@...k.id.au> wrote:
> > Hello,
> > 
> > I have found a memory leak in 6.17.7 but I am unsure how to track it
> > down effectively.
> > 
> >   
> 
> I think the `memory allocation profiling` feature can help.
> https://docs.kernel.org/mm/allocation-profiling.html
> 
> You would need to build a kernel with 
> CONFIG_MEM_ALLOC_PROFILING=y
> CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y
> 
> And check /proc/allocinfo for the suspicious allocations which take
> more memory than expected.
> 
> (I once caught a nvidia driver memory leak.)
> 
> 
> FYI
> David
> 

Thank you for your suggestion. I have some results.

Ran the rsync workload for about 9 hours. It started to look like it
was happening.
# smem -pw
Area                           Used      Cache   Noncache 
firmware/hardware             0.00%      0.00%      0.00% 
kernel image                  0.00%      0.00%      0.00% 
kernel dynamic memory        80.46%     65.80%     14.66% 
userspace memory              0.35%      0.16%      0.19% 
free memory                  19.19%     19.19%      0.00% 
# sort -g /proc/allocinfo|tail|numfmt --to=iec
         22M     5609 mm/memory.c:1190 func:folio_prealloc 
         23M     1932 fs/xfs/xfs_buf.c:226 [xfs]
func:xfs_buf_alloc_backing_mem 
         24M    24135 fs/xfs/xfs_icache.c:97 [xfs] func:xfs_inode_alloc 
         27M     6693 mm/memory.c:1192 func:folio_prealloc 
         58M    14784 mm/page_ext.c:271 func:alloc_page_ext 
        258M      129 mm/khugepaged.c:1069 func:alloc_charge_folio 
        430M   770788 lib/xarray.c:378 func:xas_alloc 
        545M    36444 mm/slub.c:3059 func:alloc_slab_page 
        9.8G  2563617 mm/readahead.c:189 func:ractl_alloc_folio 
         20G  5164004 mm/filemap.c:2012 func:__filemap_get_folio 


So I stopped the workload and dropped caches to confirm.

# echo 3 > /proc/sys/vm/drop_caches
# smem -pw
Area                           Used      Cache   Noncache 
firmware/hardware             0.00%      0.00%      0.00% 
kernel image                  0.00%      0.00%      0.00% 
kernel dynamic memory        33.45%      0.09%     33.36% 
userspace memory              0.36%      0.16%      0.19% 
free memory                  66.20%     66.20%      0.00% 
# sort -g /proc/allocinfo|tail|numfmt --to=iec
         12M     2987 mm/execmem.c:41 func:execmem_vmalloc 
         12M        3 kernel/dma/pool.c:96 func:atomic_pool_expand 
         13M      751 mm/slub.c:3061 func:alloc_slab_page 
         16M        8 mm/khugepaged.c:1069 func:alloc_charge_folio 
         18M     4355 mm/memory.c:1190 func:folio_prealloc 
         24M     6119 mm/memory.c:1192 func:folio_prealloc 
         58M    14784 mm/page_ext.c:271 func:alloc_page_ext 
         61M    15448 mm/readahead.c:189 func:ractl_alloc_folio 
         79M     6726 mm/slub.c:3059 func:alloc_slab_page 
         11G  2674488 mm/filemap.c:2012 func:__filemap_get_folio

So if I'm reading this correctly something is causing folios collect
and not be able to be freed?

Also it's clear that some of the folio's are counting as cache and some
aren't. 

Like I said 6.17 and 6.18 both have the issue. 6.12 does not. I'm now
going to manually walk through previous kernel releases and find
where it first starts happening purely because I'm having issues
building earlier kernels due to rust stuff and other python
incompatibilities making doing a git-bisect a bit fun.

I'll do it the packages way until I get closer, then solve the build
issues. 

Thanks,
Mal