[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4f3d113065ed2fdc8f643c073fed49981e975d0b.camel@ibm.com>
Date: Tue, 16 Dec 2025 02:02:12 +0000
From: Viacheslav Dubeyko <Slava.Dubeyko@....com>
To: "malcolm@...k.id.au" <malcolm@...k.id.au>
CC: "idryomov@...il.com" <idryomov@...il.com>,
"00107082@....com"
<00107082@....com>,
"ceph-devel@...r.kernel.org"
<ceph-devel@...r.kernel.org>,
"linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>,
Xiubo Li <xiubli@...hat.com>, "surenb@...gle.com" <surenb@...gle.com>
Subject: RE: RRe: Possible memory leak in 6.17.7
On Tue, 2025-12-16 at 11:26 +1000, Mal Haak wrote:
> On Mon, 15 Dec 2025 19:42:56 +0000
> Viacheslav Dubeyko <Slava.Dubeyko@....com> wrote:
>
> > Hi Mal,
> >
> <SNIP>
> >
> > Thanks a lot for reporting the issue. Finally, I can see the
> > discussion in email list. :) Are you working on the patch with the
> > fix? Should we wait for the fix or I need to start the issue
> > reproduction and investigation? I am simply trying to avoid patches
> > collision and, also, I have multiple other issues for the fix in
> > CephFS kernel client. :)
> >
> > Thanks,
> > Slava.
>
> Hello,
>
> Unfortunately creating a patch is just outside my comfort zone, I've
> lived too long in Lustre land.
>
> I've have been trying to narrow down a consistent reproducer that's as
> fast as my production workload. (It crashes a 32GB VM in 2hrs) And I
> haven't got it quite as fast. I think the dd workload is too well
> behaved.
>
> I can confirm the issue appeared in the major patch set that was
> applied as part of the 6.15 kernel. So during the more complete pages
> to folios switch and that nothing has changed in the bug behaviour since
> then. I did have a look at all the diffs from 6.14 to 6.18 on addr.c
> and didn't see any changes post 6.15 that looked like they would impact
> the bug behavior.
>
> Again, I'm not super familiar with the CephFS code but to hazard a
> guess, but I think that the web download workload triggers things faster
> suggests that unaligned writes might make things worse. But again, I'm
> not 100% sure. I can't find a reproducer as fast as downloading a
> dataset. Rsync of lots and lots of tiny files is a tad faster than the
> dd case.
>
> I did see some changes in ceph_check_page_before_write where the
> previous code unlocked pages and then continued where as the changed
> folio code just returns ENODATA and doesn't unlock anything with most
> of the rest of the logic unchanged. This might be perfectly fine, but
> in my, admittedly limited, reading of the code I couldn't figure out
> where anything that was locked prior to this being called would get
> unlocked like it did prior to the change. Again, I could be miles off
> here and one of the bulk reclaim/unlock passes that was added might be
> cleaning this up correctly or some other functional change might take
> care of this, but it looks to be potentially in the code path I'm
> excising and it has had some unlock logic changed.
>
> I've spent most of my time trying to find a solid quick reproducer. Not
> that it takes long to start leaking folios, but I wanted something that
> aggressively triggered it so a small vm would oom quickly and when
> combined with crash_on_oom it could potentially be used for regression
> testing by way of "did vm crash?".
>
> I'm not sure if it will super help, but I'll provide what details I can
> about the actual workload that really sets it off. It's a python based
> tool for downloading datasets. Datasets are split into N chunks and the
> tool downloads them in parallel 100 at a time until all N chunks are
> down. The compressed dataset is then unpacked and reassembled for
> use with workloads.
>
> This is replicating a common home folder usecase in HPC. CephFS is very
> attractive for home folders due to it's "NFS-like" utility and
> performance. And many tools use a similar method for fetching large
> datasets. Tools are frequently written in python or go.
>
> None of my customers have hit this yet, not have any enterprise
> customers as none have moved to a new enough kernel yet due to slow
> upgrade cycles. Even Proxmox have only just started testing on a kernel
> version > 6.14.
>
> I'm more than happy to help however I can with testing. I can run
> instrumented kernels or test patches or whatever you need. I am sorry I
> haven't been able to produce a super clean, fast reproducer (my test
> cluster at home is all spinners and only 500TB usable). But I figured I
> needed to get the word out asap as distros and soon customers are going
> to be moving past 6.12-6.14 kernels as the 5-7 year update cycle
> marches on. Especially those wanting to take full advantage of CacheFS
> and encryption functionality.
>
> Again thanks for looking at this and do reach out if I can help in
> anyway. I am in the ceph slack if it's faster to reach out that way.
>
>
Thanks a lot for of your efforts. I hope it will help a lot. Let me start to
reproduce the issue. I'll let you know if I need additional details. I'll share
my progress and potential troubles in the ticket that you've created.
Thanks,
Slava.
Powered by blists - more mailing lists