linux-kernel - Re: [Regerssion] [KSM] KSM CPU overhead in 6.16+ kernel compared to <=6.15 versions ("folio_walk

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <423de7a3-1c62-4e72-8e79-19a6413e420c@redhat.com>
Date: Mon, 13 Oct 2025 12:18:44 +0200
From: David Hildenbrand <david@...hat.com>
To: craftfever@...ena.io, akpm@...ux-foundation.org, xu.xin16@....com.cn,
 chengming.zhou@...ux.dev
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
 regressions@...ts.linux.dev
Subject: Re: [Regerssion] [KSM] KSM CPU overhead in 6.16+ kernel compared to
 <=6.15 versions ("folio_walk_start" kernel object overhead)

On 13.10.25 11:52, David Hildenbrand wrote:
> On 13.10.25 11:22, craftfever@...ena.io wrote:
> 
> Hi,
> 
>> I've posted about that problem already on bigzilla (#220599), but maintainers asked to post issues on maillist.
>> The problem with freezes during KSM page scanning with certain processes like Chromium with huge virtual memory size amount was fized in 6.17.1 compared to 6.16.x/6.17, but problem with huge CPU overhead is present there. Compared to Linux <=6.15, where the overhead is much lighter anad there no much CPU consuming during KSM scanning, there is "folio_walk_start" kernel object is present (which I reviewed with "perf top" command) that is not present in versions <=6.15 during KSM work and which is in work starting from Linux 6.16. This method very resource-consuming compared to algorithm used in <=6.15 versions. Is there a kernel parameter to disable it or it needs more optimization?
> 
> I doubt hat it has a lot to do with folio_walk_start(), that's just a
> simple page table walk replacing the previous walk based on follow_page().
> 
> So that's why you would suddenly spot it in perf top -- before commit
> b1d3e9bbccb4 ("mm/ksm: convert scan_get_next_rmap_item() from
> follow_page() to folio_walk") we would have used follow_page().
> 
> Do you see any kernel splats / soft-lockups?
> 
> I can see that in commit b1d3e9bbccb4 I removed a cond_resched(). maybe
> that's why it's a problem in you kernel config.

Looking again, no, that's not the case. We do a cond_resched() after 
every page we looked up.

Also, b1d3e9bbccb4 was introduced in v6.12 already. Regarding 
folio_walk_start(), also nothing major changed ever since v6.12.

Looking at scan_get_next_rmap_item(). I guess we might hold the mmap 
lock for quite a long time (if we're iterating large areas where there 
are no suitable pages mapped -- very large sparse areas).

That would explain why we end up calling folio_walk_start() that frequently.

But nothing really changed in that regard lately in KSM code.

What we probably should be doing, is give up the mmap lock after 
scanning a certain size. Or better, switch to per-VMA locks if possible.

Also, looking up each address is highly inefficient if we end up having
large empty areas. A range-walk function would be much better suited for 
that, so we can just jump over holes completely.

But anyhow, nothing seems to have changed ever since 6.15 AFAIKT, so I'm 
not really sure what's going on here. Likely it's unrelated to KSM changes.

-- 
Cheers

David / dhildenb