linux-kernel - Re: [PATCH] mm/rmap: do not add fully unmapped large folio to deferred split list

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c76c78ab-7146-4963-97aa-980b767e84a5@redhat.com>
Date: Mon, 15 Apr 2024 21:19:25 +0200
From: David Hildenbrand <david@...hat.com>
To: Yang Shi <shy828301@...il.com>
Cc: Zi Yan <ziy@...dia.com>, linux-mm@...ck.org,
 Andrew Morton <akpm@...ux-foundation.org>,
 Matthew Wilcox <willy@...radead.org>, Ryan Roberts <ryan.roberts@....com>,
 Barry Song <21cnbao@...il.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm/rmap: do not add fully unmapped large folio to
 deferred split list

>>
>> We could have
>> * THP_DEFERRED_SPLIT_PAGE
>> * THP_UNDO_DEFERRED_SPLIT_PAGE
>> * THP_PERFORM_DEFERRED_SPLIT_PAGE
>>
>> Maybe that would catch more cases (not sure if all, though). Then, you
>> could tell how many are still on that list. THP_DEFERRED_SPLIT_PAGE -
>> THP_UNDO_DEFERRED_SPLIT_PAGE - THP_PERFORM_DEFERRED_SPLIT_PAGE.
>>
>> That could give one a clearer picture how deferred split interacts with
>> actual splitting (possibly under memory pressure), the whole reason why
>> deferred splitting was added after all.
> 
> I'm not quite sure whether there is a solid usecase or not. If we
> have, we could consider this. But a simpler counter may be more
> preferred.

Yes.

> 
>>
>>> It may be useful. However the counter is typically used to estimate
>>> how many THP are partially unmapped during a period of time.
>>
>> I'd say it's a bit of an abuse of that counter; well, or interpreting
>> something into the counter that that counter never reliably represented.
> 
> It was way more reliable than now.

Correct me if I am wrong: now that we only adjust the counter for 
PMD-sized THP, it is as (un)reliable as it always was.

Or was there another unintended change by some of my cleanups or 
previous patches?

> 
>>
>> I can easily write a program that keeps sending your counter to infinity
>> simply by triggering that behavior in a loop, so it's all a bit shaky.
> 
> I don't doubt that. But let's get back to reality. The counter used to
> stay reasonable and reliable with most real life workloads before
> mTHP. There may be over-counting, for example, when unmapping a
> PTE-mapped THP which was not on a deferred split queue before. But
> such a case is not common for real life workloads because the huge PMD
> has to be split by partial unmap for most cases. And the partial unmap
> will add the THP to deferred split queue.
> 
> But now a common workload, for example, just process exit, may
> probably send the counter to infinity.

Agreed, that's stupid.

> 
>>
>> Something like Ryans script makes more sense, where you get a clearer
>> picture of what's mapped where and how. Because that information can be
>> much more valuable than just knowing if it's mapped fully or partially
>> (again, relevant for handling with memory waste).
> 
> Ryan's script is very helpful. But the counter has been existing and
> used for years, and it is a quick indicator and much easier to monitor
> in a large-scale fleet.
> 
> If we think the reliability of the counter is not worth fixing, why
> don't we just remove it. No counter is better than a broken counter.

Again, is only counting the PMD-sized THPs "fixing" the old use cases? 
Then it should just stick around. And we can even optimize it for some 
more cases as proposed in this patch. But there is no easy way to "get 
it completely right" I'm afraid.

-- 
Cheers,

David / dhildenb