[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <17469653.4a75.19b01691299.Coremail.00107082@163.com>
Date: Tue, 9 Dec 2025 12:40:21 +0800 (CST)
From: "David Wang" <00107082@....com>
To: "Mal Haak" <malcolm@...k.id.au>
Cc: linux-kernel@...r.kernel.org, surenb@...gle.com, xiubli@...hat.com,
idryomov@...il.com, ceph-devel@...r.kernel.org
Subject: Re: Possible memory leak in 6.17.7
At 2025-12-09 07:08:31, "Mal Haak" <malcolm@...k.id.au> wrote:
>On Mon, 8 Dec 2025 19:08:29 +0800
>David Wang <00107082@....com> wrote:
>
>> On Mon, 10 Nov 2025 18:20:08 +1000
>> Mal Haak <malcolm@...k.id.au> wrote:
>> > Hello,
>> >
>> > I have found a memory leak in 6.17.7 but I am unsure how to track it
>> > down effectively.
>> >
>> >
>>
>> I think the `memory allocation profiling` feature can help.
>> https://docs.kernel.org/mm/allocation-profiling.html
>>
>> You would need to build a kernel with
>> CONFIG_MEM_ALLOC_PROFILING=y
>> CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y
>>
>> And check /proc/allocinfo for the suspicious allocations which take
>> more memory than expected.
>>
>> (I once caught a nvidia driver memory leak.)
>>
>>
>> FYI
>> David
>>
>
>Thank you for your suggestion. I have some results.
>
>Ran the rsync workload for about 9 hours. It started to look like it
>was happening.
># smem -pw
>Area Used Cache Noncache
>firmware/hardware 0.00% 0.00% 0.00%
>kernel image 0.00% 0.00% 0.00%
>kernel dynamic memory 80.46% 65.80% 14.66%
>userspace memory 0.35% 0.16% 0.19%
>free memory 19.19% 19.19% 0.00%
># sort -g /proc/allocinfo|tail|numfmt --to=iec
> 22M 5609 mm/memory.c:1190 func:folio_prealloc
> 23M 1932 fs/xfs/xfs_buf.c:226 [xfs]
>func:xfs_buf_alloc_backing_mem
> 24M 24135 fs/xfs/xfs_icache.c:97 [xfs] func:xfs_inode_alloc
> 27M 6693 mm/memory.c:1192 func:folio_prealloc
> 58M 14784 mm/page_ext.c:271 func:alloc_page_ext
> 258M 129 mm/khugepaged.c:1069 func:alloc_charge_folio
> 430M 770788 lib/xarray.c:378 func:xas_alloc
> 545M 36444 mm/slub.c:3059 func:alloc_slab_page
> 9.8G 2563617 mm/readahead.c:189 func:ractl_alloc_folio
> 20G 5164004 mm/filemap.c:2012 func:__filemap_get_folio
>
>
>So I stopped the workload and dropped caches to confirm.
>
># echo 3 > /proc/sys/vm/drop_caches
># smem -pw
>Area Used Cache Noncache
>firmware/hardware 0.00% 0.00% 0.00%
>kernel image 0.00% 0.00% 0.00%
>kernel dynamic memory 33.45% 0.09% 33.36%
>userspace memory 0.36% 0.16% 0.19%
>free memory 66.20% 66.20% 0.00%
># sort -g /proc/allocinfo|tail|numfmt --to=iec
> 12M 2987 mm/execmem.c:41 func:execmem_vmalloc
> 12M 3 kernel/dma/pool.c:96 func:atomic_pool_expand
> 13M 751 mm/slub.c:3061 func:alloc_slab_page
> 16M 8 mm/khugepaged.c:1069 func:alloc_charge_folio
> 18M 4355 mm/memory.c:1190 func:folio_prealloc
> 24M 6119 mm/memory.c:1192 func:folio_prealloc
> 58M 14784 mm/page_ext.c:271 func:alloc_page_ext
> 61M 15448 mm/readahead.c:189 func:ractl_alloc_folio
> 79M 6726 mm/slub.c:3059 func:alloc_slab_page
> 11G 2674488 mm/filemap.c:2012 func:__filemap_get_folio
>
>So if I'm reading this correctly something is causing folios collect
>and not be able to be freed?
CC cephfs, maybe someone could have an easy reading out of those folio usage
>
>Also it's clear that some of the folio's are counting as cache and some
>aren't.
>
>Like I said 6.17 and 6.18 both have the issue. 6.12 does not. I'm now
>going to manually walk through previous kernel releases and find
>where it first starts happening purely because I'm having issues
>building earlier kernels due to rust stuff and other python
>incompatibilities making doing a git-bisect a bit fun.
>
>I'll do it the packages way until I get closer, then solve the build
>issues.
>
>Thanks,
>Mal
>
Powered by blists - more mailing lists