linux-kernel - Re: [Regerssion] [KSM] KSM CPU overhead in 6.16+ kernel compared to <=6.15 versions ("folio_walk

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <8e458538-69dc-4c0f-a25b-0c85ce1e866e@redhat.com>
Date: Mon, 13 Oct 2025 11:52:37 +0200
From: David Hildenbrand <david@...hat.com>
To: craftfever@...ena.io, akpm@...ux-foundation.org, xu.xin16@....com.cn,
 chengming.zhou@...ux.dev
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
 regressions@...ts.linux.dev
Subject: Re: [Regerssion] [KSM] KSM CPU overhead in 6.16+ kernel compared to
 <=6.15 versions ("folio_walk_start" kernel object overhead)

On 13.10.25 11:22, craftfever@...ena.io wrote:

Hi,

> I've posted about that problem already on bigzilla (#220599), but maintainers asked to post issues on maillist.
> The problem with freezes during KSM page scanning with certain processes like Chromium with huge virtual memory size amount was fized in 6.17.1 compared to 6.16.x/6.17, but problem with huge CPU overhead is present there. Compared to Linux <=6.15, where the overhead is much lighter anad there no much CPU consuming during KSM scanning, there is "folio_walk_start" kernel object is present (which I reviewed with "perf top" command) that is not present in versions <=6.15 during KSM work and which is in work starting from Linux 6.16. This method very resource-consuming compared to algorithm used in <=6.15 versions. Is there a kernel parameter to disable it or it needs more optimization?

I doubt hat it has a lot to do with folio_walk_start(), that's just a 
simple page table walk replacing the previous walk based on follow_page().

So that's why you would suddenly spot it in perf top -- before commit 
b1d3e9bbccb4 ("mm/ksm: convert scan_get_next_rmap_item() from 
follow_page() to folio_walk") we would have used follow_page().

Do you see any kernel splats / soft-lockups?

I can see that in commit b1d3e9bbccb4 I removed a cond_resched(). maybe 
that's why it's a problem in you kernel config.

-- 
Cheers

David / dhildenb