[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a2e11353-fad8-475c-a4d1-dc1de22dde11@redhat.com>
Date: Wed, 25 Oct 2023 20:47:42 +0200
From: David Hildenbrand <david@...hat.com>
To: Ryan Roberts <ryan.roberts@....com>,
Andrew Morton <akpm@...ux-foundation.org>,
Matthew Wilcox <willy@...radead.org>,
Yin Fengwei <fengwei.yin@...el.com>,
Yu Zhao <yuzhao@...gle.com>,
Catalin Marinas <catalin.marinas@....com>,
Anshuman Khandual <anshuman.khandual@....com>,
Yang Shi <shy828301@...il.com>,
"Huang, Ying" <ying.huang@...el.com>, Zi Yan <ziy@...dia.com>,
Luis Chamberlain <mcgrof@...nel.org>,
Itaru Kitayama <itaru.kitayama@...il.com>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
John Hubbard <jhubbard@...dia.com>,
David Rientjes <rientjes@...gle.com>,
Vlastimil Babka <vbabka@...e.cz>,
Hugh Dickins <hughd@...gle.com>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org
Subject: Re: [PATCH v6 0/9] variable-order, large folios for anonymous memory
On 25.10.23 18:24, Ryan Roberts wrote:
> On 20/10/2023 13:33, Ryan Roberts wrote:
>> On 06/10/2023 21:06, David Hildenbrand wrote:
>>> On 29.09.23 13:44, Ryan Roberts wrote:
>>>> Hi All,
>>>
>>
>> [...]
>>
>>>> NOTE: These changes should not be merged until the prerequisites are complete.
>>>> These are in progress and tracked at [7].
>>>
>>> We should probably list them here, and classify which one we see as strict a
>>> requirement, which ones might be an optimization.
>>>
>>
>> Bringing back the discussion of prerequistes to this thread following the
>> discussion at the mm-alignment meeting on Wednesday.
>>
>> Slides, updated following discussion to reflect all the agreed items that are
>> prerequisites and enhancements, are at [1].
>>
>> I've taken a closer look at the situation with khugepaged, and can confirm that
>> it does correctly collapse anon small-sized THP into PMD-sized THP. I did notice
>> though, that one of the khugepaged selftests (collapse_max_ptes_none) fails when
>> small-sized THP is enabled+always. So I've fixed that test up and will add the
>> patch to the next version of my series.
>>
>> So I believe the khugepaged prerequisite can be marked as done.
>>
>> [1]
>> https://drive.google.com/file/d/1GnfYFpr7_c1kA41liRUW5YtCb8Cj18Ud/view?usp=sharing&resourcekey=0-U1Mj3-RhLD1JV6EThpyPyA
>
> Hi All,
Hi,
I wanted to remind people in the THP cabal meeting, but that either
didn't happen or zoomed decided to not let me join :)
>
> It's been a week since the mm alignment meeting discussion we had around
> prerequisites and the ABI. I haven't heard any further feedback on the ABI
> proposal, so I'm going to be optimistic and assume that nobody has found any
> fatal flaws in it :).
After saying in the call probably 10 times that people should comment
here if there are reasonable alternatives worth discussing, call me
"optimistic" as well; but, it's only been a week and people might still
be thinking about this/
There were two things discussed in the call:
* Yu brought up "lists" so we can have priorities. As briefly discussed
in the call, this (a) might not be needed right now in an initial
version; (b) the kernel might be able to handle that (or many cases)
automatically, TBD. Adding lists now would kind-of set the semantics
of that interface in stone. As you describe below, the approach
discussed here could easily be extended to cover priorities, if need
be.
* Hugh raised the point that "bitmap of orders" could be replaced by
"added THP sizes", which really is "bitmap of orders" shifted to the
left. To configure 2 MiB + 64Kib, one would get "2097152 + 65536" =
"2162688" or in KiB "2112". Hm.
Both approaches would require single-option files like "enable_always",
"enable_madvise" ... which I don't particularly like, but who am I to judge.
>
> Certainly, I think it held up to the potential future policies that Yu Zhou
> cited on the call - the possibility of preferring a smaller size over a bigger
> one, if the smaller size can be allocated without splitting a contiguous block.
> I think the suggestion of adding a per-size priority file would solve it. And in
> general because we have a per-size directory, that gives us lots of flexibility
> for growth.
Jup, same opinion here. But again, I'm very happy to hear other
alternatives and why they are better.
>
> Anyway, given the lack of feedback, I'm proposing to spin a new version. I'm
> planning to do the following:
>
> - Drop the accounting patch (#3); we will continue to only account PMD-sized
> THP for now. We can add more counters in future if needed. page cache large
> folios haven't needed any new counters yet.
>
> - Pivot to the ABI proposed by DavidH; per-size directories in a similar shape
> to that used by hugetlb
>
> - Drop the "recommend" keyword patch (#6); For now, users will need to
> understand implicitly which sizes are beneficial to their HW perf
>
> - Drop patch (#7); arch_wants_pte_order() is no longer needed due to dropping
> patch #6
>
> - Add patch for khugepaged selftest improvement (described in previous email
> above).
>
> - Ensure that PMD_ORDER is not assumed to be compile-time constant (current
> code is broken on powerpc)
>
> Please shout if you think this is the wrong approach.
I'll shout that this sounds good to me; rather wait a bit more for more
opinions. It probably makes sense to post something after the (upcoming)
merge window, if there are no further discussions here.
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists