lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a2e11353-fad8-475c-a4d1-dc1de22dde11@redhat.com>
Date:   Wed, 25 Oct 2023 20:47:42 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Ryan Roberts <ryan.roberts@....com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Matthew Wilcox <willy@...radead.org>,
        Yin Fengwei <fengwei.yin@...el.com>,
        Yu Zhao <yuzhao@...gle.com>,
        Catalin Marinas <catalin.marinas@....com>,
        Anshuman Khandual <anshuman.khandual@....com>,
        Yang Shi <shy828301@...il.com>,
        "Huang, Ying" <ying.huang@...el.com>, Zi Yan <ziy@...dia.com>,
        Luis Chamberlain <mcgrof@...nel.org>,
        Itaru Kitayama <itaru.kitayama@...il.com>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        John Hubbard <jhubbard@...dia.com>,
        David Rientjes <rientjes@...gle.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Hugh Dickins <hughd@...gle.com>
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        linux-arm-kernel@...ts.infradead.org
Subject: Re: [PATCH v6 0/9] variable-order, large folios for anonymous memory

On 25.10.23 18:24, Ryan Roberts wrote:
> On 20/10/2023 13:33, Ryan Roberts wrote:
>> On 06/10/2023 21:06, David Hildenbrand wrote:
>>> On 29.09.23 13:44, Ryan Roberts wrote:
>>>> Hi All,
>>>
>>
>> [...]
>>
>>>> NOTE: These changes should not be merged until the prerequisites are complete.
>>>> These are in progress and tracked at [7].
>>>
>>> We should probably list them here, and classify which one we see as strict a
>>> requirement, which ones might be an optimization.
>>>
>>
>> Bringing back the discussion of prerequistes to this thread following the
>> discussion at the mm-alignment meeting on Wednesday.
>>
>> Slides, updated following discussion to reflect all the agreed items that are
>> prerequisites and enhancements, are at [1].
>>
>> I've taken a closer look at the situation with khugepaged, and can confirm that
>> it does correctly collapse anon small-sized THP into PMD-sized THP. I did notice
>> though, that one of the khugepaged selftests (collapse_max_ptes_none) fails when
>> small-sized THP is enabled+always. So I've fixed that test up and will add the
>> patch to the next version of my series.
>>
>> So I believe the khugepaged prerequisite can be marked as done.
>>
>> [1]
>> https://drive.google.com/file/d/1GnfYFpr7_c1kA41liRUW5YtCb8Cj18Ud/view?usp=sharing&resourcekey=0-U1Mj3-RhLD1JV6EThpyPyA
> 
> Hi All,

Hi,

I wanted to remind people in the THP cabal meeting, but that either 
didn't happen or zoomed decided to not let me join :)

> 
> It's been a week since the mm alignment meeting discussion we had around
> prerequisites and the ABI. I haven't heard any further feedback on the ABI
> proposal, so I'm going to be optimistic and assume that nobody has found any
> fatal flaws in it :).

After saying in the call probably 10 times that people should comment 
here if there are reasonable alternatives worth discussing, call me 
"optimistic" as well; but, it's only been a week and people might still 
be thinking about this/

There were two things discussed in the call:

* Yu brought up "lists" so we can have priorities. As briefly discussed
   in the  call, this (a) might not be needed right now in an initial
   version;  (b) the kernel might be able to handle that (or many cases)
   automatically, TBD. Adding lists now would kind-of set the semantics
   of that interface in stone. As you describe below, the approach
   discussed here could easily be extended to cover priorities, if need
   be.

* Hugh raised the point that "bitmap of orders" could be replaced by
   "added THP sizes", which really is "bitmap of orders" shifted to the
   left. To configure 2 MiB +  64Kib, one would get "2097152 + 65536" =
   "2162688" or in KiB "2112". Hm.

Both approaches would require single-option files like "enable_always", 
"enable_madvise" ... which I don't particularly like, but who am I to judge.


> 
> Certainly, I think it held up to the potential future policies that Yu Zhou
> cited on the call - the possibility of preferring a smaller size over a bigger
> one, if the smaller size can be allocated without splitting a contiguous block.
> I think the suggestion of adding a per-size priority file would solve it. And in
> general because we have a per-size directory, that gives us lots of flexibility
> for growth.

Jup, same opinion here. But again, I'm very happy to hear other 
alternatives and why they are better.

> 
> Anyway, given the lack of feedback, I'm proposing to spin a new version. I'm
> planning to do the following:
> 
>    - Drop the accounting patch (#3); we will continue to only account PMD-sized
>      THP for now. We can add more counters in future if needed. page cache large
>      folios haven't needed any new counters yet.
> 
>    - Pivot to the ABI proposed by DavidH; per-size directories in a similar shape
>      to that used by hugetlb
> 
>    - Drop the "recommend" keyword patch (#6); For now, users will need to
>      understand implicitly which sizes are beneficial to their HW perf
> 
>    - Drop patch (#7); arch_wants_pte_order() is no longer needed due to dropping
>      patch #6
> 
>    - Add patch for khugepaged selftest improvement (described in previous email
>      above).
> 
>    - Ensure that PMD_ORDER is not assumed to be compile-time constant (current
>      code is broken on powerpc)
> 
> Please shout if you think this is the wrong approach.

I'll shout that this sounds good to me; rather wait a bit more for more 
opinions. It probably makes sense to post something after the (upcoming) 
merge window, if there are no further discussions here.

-- 
Cheers,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ