linux-kernel - Re: Possible memory leak in 6.17.7

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <17469653.4a75.19b01691299.Coremail.00107082@163.com>
Date: Tue, 9 Dec 2025 12:40:21 +0800 (CST)
From: "David Wang" <00107082@....com>
To: "Mal Haak" <malcolm@...k.id.au>
Cc: linux-kernel@...r.kernel.org, surenb@...gle.com, xiubli@...hat.com,
	idryomov@...il.com, ceph-devel@...r.kernel.org
Subject: Re: Possible memory leak in 6.17.7


At 2025-12-09 07:08:31, "Mal Haak" <malcolm@...k.id.au> wrote:
>On Mon,  8 Dec 2025 19:08:29 +0800
>David Wang <00107082@....com> wrote:
>
>> On Mon, 10 Nov 2025 18:20:08 +1000
>> Mal Haak <malcolm@...k.id.au> wrote:
>> > Hello,
>> > 
>> > I have found a memory leak in 6.17.7 but I am unsure how to track it
>> > down effectively.
>> > 
>> >   
>> 
>> I think the `memory allocation profiling` feature can help.
>> https://docs.kernel.org/mm/allocation-profiling.html
>> 
>> You would need to build a kernel with 
>> CONFIG_MEM_ALLOC_PROFILING=y
>> CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y
>> 
>> And check /proc/allocinfo for the suspicious allocations which take
>> more memory than expected.
>> 
>> (I once caught a nvidia driver memory leak.)
>> 
>> 
>> FYI
>> David
>> 
>
>Thank you for your suggestion. I have some results.
>
>Ran the rsync workload for about 9 hours. It started to look like it
>was happening.
># smem -pw
>Area                           Used      Cache   Noncache 
>firmware/hardware             0.00%      0.00%      0.00% 
>kernel image                  0.00%      0.00%      0.00% 
>kernel dynamic memory        80.46%     65.80%     14.66% 
>userspace memory              0.35%      0.16%      0.19% 
>free memory                  19.19%     19.19%      0.00% 
># sort -g /proc/allocinfo|tail|numfmt --to=iec
>         22M     5609 mm/memory.c:1190 func:folio_prealloc 
>         23M     1932 fs/xfs/xfs_buf.c:226 [xfs]
>func:xfs_buf_alloc_backing_mem 
>         24M    24135 fs/xfs/xfs_icache.c:97 [xfs] func:xfs_inode_alloc 
>         27M     6693 mm/memory.c:1192 func:folio_prealloc 
>         58M    14784 mm/page_ext.c:271 func:alloc_page_ext 
>        258M      129 mm/khugepaged.c:1069 func:alloc_charge_folio 
>        430M   770788 lib/xarray.c:378 func:xas_alloc 
>        545M    36444 mm/slub.c:3059 func:alloc_slab_page 
>        9.8G  2563617 mm/readahead.c:189 func:ractl_alloc_folio 
>         20G  5164004 mm/filemap.c:2012 func:__filemap_get_folio 
>
>
>So I stopped the workload and dropped caches to confirm.
>
># echo 3 > /proc/sys/vm/drop_caches
># smem -pw
>Area                           Used      Cache   Noncache 
>firmware/hardware             0.00%      0.00%      0.00% 
>kernel image                  0.00%      0.00%      0.00% 
>kernel dynamic memory        33.45%      0.09%     33.36% 
>userspace memory              0.36%      0.16%      0.19% 
>free memory                  66.20%     66.20%      0.00% 
># sort -g /proc/allocinfo|tail|numfmt --to=iec
>         12M     2987 mm/execmem.c:41 func:execmem_vmalloc 
>         12M        3 kernel/dma/pool.c:96 func:atomic_pool_expand 
>         13M      751 mm/slub.c:3061 func:alloc_slab_page 
>         16M        8 mm/khugepaged.c:1069 func:alloc_charge_folio 
>         18M     4355 mm/memory.c:1190 func:folio_prealloc 
>         24M     6119 mm/memory.c:1192 func:folio_prealloc 
>         58M    14784 mm/page_ext.c:271 func:alloc_page_ext 
>         61M    15448 mm/readahead.c:189 func:ractl_alloc_folio 
>         79M     6726 mm/slub.c:3059 func:alloc_slab_page 
>         11G  2674488 mm/filemap.c:2012 func:__filemap_get_folio
>
>So if I'm reading this correctly something is causing folios collect
>and not be able to be freed?

CC cephfs, maybe someone could have an easy reading out of those folio usage


>
>Also it's clear that some of the folio's are counting as cache and some
>aren't. 
>
>Like I said 6.17 and 6.18 both have the issue. 6.12 does not. I'm now
>going to manually walk through previous kernel releases and find
>where it first starts happening purely because I'm having issues
>building earlier kernels due to rust stuff and other python
>incompatibilities making doing a git-bisect a bit fun.
>
>I'll do it the packages way until I get closer, then solve the build
>issues. 
>
>Thanks,
>Mal
>