[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <80b153cd-8bba-4bcd-9b56-3b2ad3f295e1@redhat.com>
Date: Mon, 13 Oct 2025 20:55:34 +0200
From: David Hildenbrand <david@...hat.com>
To: 423de7a3-1c62-4e72-8e79-19a6413e420c@...hat.com
Cc: akpm@...ux-foundation.org, chengming.zhou@...ux.dev,
craftfever@...ena.io, linux-kernel@...r.kernel.org, linux-mm@...ck.org,
regressions@...ts.linux.dev, xu.xin16@....com.cn,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Subject: Re: [Regerssion] [KSM] KSM CPU overhead in 6.16+ kernel compared to
<=6.15 versions ("folio_walk_start" kernel object overhead)
On 13.10.25 19:09, craftfever wrote:
> > Looking again, no, that's not the case. We do a cond_resched() after
> every page we looked up.
> >
> > Also, b1d3e9bbccb4 was introduced in v6.12 already. Regarding
> folio_walk_start(), also nothing major changed ever since v6.12.
> >
> > Looking at scan_get_next_rmap_item(). I guess we might hold the mmap
> lock for quite a long time (if we're iterating large areas where there
> are no suitable pages mapped -- very large sparse areas).
> >
> > That would explain why we end up calling folio_walk_start() that
> frequently.
> >
> > But nothing really changed in that regard lately in KSM code.
> >
> > What we probably should be doing, is give up the mmap lock after
> scanning a certain size. Or better, switch to per-VMA locks if possible.
> >
> > Also, looking up each address is highly inefficient if we end up having
> > large empty areas. A range-walk function would be much better suited
> for that, so we can just jump over holes completely.
> >
> > But anyhow, nothing seems to have changed ever since 6.15 AFAIKT, so
> I'm not really sure what's going on here. Likely it's unrelated to KSM
> changes.
> >
> > -- Cheers
> >
> > David / dhildenb
> >
>
> I have to make a correction, folio_start_walk is present in "perf top"
> statistics on 6.12-6.15, it just consumes 0.5-1% of kernel time compared
> to 11-14% on 6.16+, where it causes ksmd 100% cpu usage compared <=6.15
> kernels.
I'm currently looking at the diff from 6.15 -> 6.16.
In KSM code nothing changed, really.
In folio_walk_start() itself nothing changed.
In the functions it calls also nothing relevant should have changed.
So the only explanation would be that it is simply called much more frequently.
And that might be the case if we are now scanning much, much larger VMAs that
are mostly empty, that would otherwise not be scanned.
I now recall that we had a fix from Lorenzo:
commit cf7e7a3503df0b71afd68ee84e9a09d4514cc2dd
Author: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Date: Thu May 29 18:15:47 2025 +0100
mm: prevent KSM from breaking VMA merging for new VMAs
If a user wishes to enable KSM mergeability for an entire process and all
fork/exec'd processes that come after it, they use the prctl()
PR_SET_MEMORY_MERGE operation.
That went into 6.17.
Assuming we merge more carefully now, we might no longer run into the
if (!vma->anon_vma)
ksm_scan.address = vma->vm_end;
For the gigantic VMAs and possibly end up scanning these gigantic empty VMAs.
Just a thought:
A) Can you reproduce on 6.17?
B) Does the 6.16 you are testing with contain a backport of that commit?
Definitely, scan_get_next_rmap_item() must be optimized to walk a sparse page table
more efficiently.
> I understand, that something changed in linked function, that
> affecting KSM behavior. Maybe, you can reproduce it with same settings,
> especially it happens with Chromium apps, there is V8 sandbox with huge
> VM size. Maybe, you could reproduce the problem with the same
> MemoryKSM=yes in user@...rvice, that sets KSM processing for all user
> processes, especially, when Chromium is running. KSM CPU usage really
> differs between 6.12-6.15 and 6.16+. Maybe, it's related to your
> explanation.
I'm afraid I don't currently have time to reproduce.
--
Cheers
David / dhildenb
Powered by blists - more mailing lists