lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 25 Oct 2023 20:47:42 +0200
From:   David Hildenbrand <>
To:     Ryan Roberts <>,
        Andrew Morton <>,
        Matthew Wilcox <>,
        Yin Fengwei <>,
        Yu Zhao <>,
        Catalin Marinas <>,
        Anshuman Khandual <>,
        Yang Shi <>,
        "Huang, Ying" <>, Zi Yan <>,
        Luis Chamberlain <>,
        Itaru Kitayama <>,
        "Kirill A. Shutemov" <>,
        John Hubbard <>,
        David Rientjes <>,
        Vlastimil Babka <>,
        Hugh Dickins <>
Subject: Re: [PATCH v6 0/9] variable-order, large folios for anonymous memory

On 25.10.23 18:24, Ryan Roberts wrote:
> On 20/10/2023 13:33, Ryan Roberts wrote:
>> On 06/10/2023 21:06, David Hildenbrand wrote:
>>> On 29.09.23 13:44, Ryan Roberts wrote:
>>>> Hi All,
>> [...]
>>>> NOTE: These changes should not be merged until the prerequisites are complete.
>>>> These are in progress and tracked at [7].
>>> We should probably list them here, and classify which one we see as strict a
>>> requirement, which ones might be an optimization.
>> Bringing back the discussion of prerequistes to this thread following the
>> discussion at the mm-alignment meeting on Wednesday.
>> Slides, updated following discussion to reflect all the agreed items that are
>> prerequisites and enhancements, are at [1].
>> I've taken a closer look at the situation with khugepaged, and can confirm that
>> it does correctly collapse anon small-sized THP into PMD-sized THP. I did notice
>> though, that one of the khugepaged selftests (collapse_max_ptes_none) fails when
>> small-sized THP is enabled+always. So I've fixed that test up and will add the
>> patch to the next version of my series.
>> So I believe the khugepaged prerequisite can be marked as done.
>> [1]
> Hi All,


I wanted to remind people in the THP cabal meeting, but that either 
didn't happen or zoomed decided to not let me join :)

> It's been a week since the mm alignment meeting discussion we had around
> prerequisites and the ABI. I haven't heard any further feedback on the ABI
> proposal, so I'm going to be optimistic and assume that nobody has found any
> fatal flaws in it :).

After saying in the call probably 10 times that people should comment 
here if there are reasonable alternatives worth discussing, call me 
"optimistic" as well; but, it's only been a week and people might still 
be thinking about this/

There were two things discussed in the call:

* Yu brought up "lists" so we can have priorities. As briefly discussed
   in the  call, this (a) might not be needed right now in an initial
   version;  (b) the kernel might be able to handle that (or many cases)
   automatically, TBD. Adding lists now would kind-of set the semantics
   of that interface in stone. As you describe below, the approach
   discussed here could easily be extended to cover priorities, if need

* Hugh raised the point that "bitmap of orders" could be replaced by
   "added THP sizes", which really is "bitmap of orders" shifted to the
   left. To configure 2 MiB +  64Kib, one would get "2097152 + 65536" =
   "2162688" or in KiB "2112". Hm.

Both approaches would require single-option files like "enable_always", 
"enable_madvise" ... which I don't particularly like, but who am I to judge.

> Certainly, I think it held up to the potential future policies that Yu Zhou
> cited on the call - the possibility of preferring a smaller size over a bigger
> one, if the smaller size can be allocated without splitting a contiguous block.
> I think the suggestion of adding a per-size priority file would solve it. And in
> general because we have a per-size directory, that gives us lots of flexibility
> for growth.

Jup, same opinion here. But again, I'm very happy to hear other 
alternatives and why they are better.

> Anyway, given the lack of feedback, I'm proposing to spin a new version. I'm
> planning to do the following:
>    - Drop the accounting patch (#3); we will continue to only account PMD-sized
>      THP for now. We can add more counters in future if needed. page cache large
>      folios haven't needed any new counters yet.
>    - Pivot to the ABI proposed by DavidH; per-size directories in a similar shape
>      to that used by hugetlb
>    - Drop the "recommend" keyword patch (#6); For now, users will need to
>      understand implicitly which sizes are beneficial to their HW perf
>    - Drop patch (#7); arch_wants_pte_order() is no longer needed due to dropping
>      patch #6
>    - Add patch for khugepaged selftest improvement (described in previous email
>      above).
>    - Ensure that PMD_ORDER is not assumed to be compile-time constant (current
>      code is broken on powerpc)
> Please shout if you think this is the wrong approach.

I'll shout that this sounds good to me; rather wait a bit more for more 
opinions. It probably makes sense to post something after the (upcoming) 
merge window, if there are no further discussions here.


David / dhildenb

Powered by blists - more mailing lists