linux-kernel - Re: [DISCUSSION] proposed mctl() API

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <8c762435-f5d8-4366-84de-308c8280ff3d@gmail.com>
Date: Tue, 10 Jun 2025 17:00:47 +0100
From: Usama Arif <usamaarif642@...il.com>
To: Matthew Wilcox <willy@...radead.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
 David Hildenbrand <david@...hat.com>,
 Andrew Morton <akpm@...ux-foundation.org>,
 Shakeel Butt <shakeel.butt@...ux.dev>,
 "Liam R . Howlett" <Liam.Howlett@...cle.com>,
 Vlastimil Babka <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
 Arnd Bergmann <arnd@...db.de>, Christian Brauner <brauner@...nel.org>,
 SeongJae Park <sj@...nel.org>, Mike Rapoport <rppt@...nel.org>,
 Johannes Weiner <hannes@...xchg.org>, Barry Song <21cnbao@...il.com>,
 linux-mm@...ck.org, linux-arch@...r.kernel.org,
 linux-kernel@...r.kernel.org, linux-api@...r.kernel.org,
 Pedro Falcato <pfalcato@...e.de>
Subject: Re: [DISCUSSION] proposed mctl() API

On 10/06/2025 16:46, Matthew Wilcox wrote:
> On Tue, Jun 10, 2025 at 04:30:43PM +0100, Usama Arif wrote:
>> If we have 2 workloads on the same server, For e.g. one is database where THPs 
>> just dont do well, but the other one is AI where THPs do really well. How
>> will the kernel monitor that the database workload is performing worse
>> and the AI one isnt?
> 
> It can monitor the allocation/access patterns and see who's getting
> the benefit.  The two workloads are in competition for memory, and
> we can tell which pages are hot and which cold.
> 
> And I don't believe it's a binary anyway.  I bet there are some
> allocations where the database benefits from having THPs (I mean, I know
> a database which invented the entire hugetlbfs subsystem so it could
> use PMD entries and avoid one layer of TLB misses!)
> 

Sure, but this is just an example. Workload owners are not going to spend time
trying to see how each allocation works and if its hot, they put it in hugetlbfs.
Ofcourse hugetlbfs has its own drawbacks of reserving pages.
This is one of the reasons that we have THPs.

But they will try THPs. i.e. if they see performance benefits from just turning
a knob, they will take it otherwise leave it.

>> I added THP shrinker to hopefully try and do this automatically, and it does
>> really help. But unfortunately it is not a complete solution.
>> There are severely memory bound workloads where even a tiny increase
>> in memory will lead to an OOM. And if you colocate the container thats running
>> that workload with one in which we will benefit with THPs, we unfortunately
>> can't just rely on the system doing the right thing.
> 
> Then maybe THP aren't for you.  If your workloads are this sensitive,
> perhaps you should be using a mechanism which gives you complete control
> like hugetlbfs.

Yes, completely agree, THPs aren't for the workloads that are this sensitive.
But that's why we need this, to disable it for them if the global policy is always,
or enable it on other services that are not sensitive and benefit from THPs
if the global policy is madvise. We have to keep in mind that these workloads
will be colocated on the same server. 

and hugetlbfs isnt transparent enough.. :)