linux-kernel - Re: [RFC PATCH 0/4] Enable >0 order folio memory compaction

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <14089E95-251E-43A4-AF32-C9773723C810@nvidia.com>
Date:   Mon, 09 Oct 2023 09:43:38 -0400
From:   Zi Yan <ziy@...dia.com>
To:     "\"Huang, Ying\"" <ying.huang@...el.com>
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        Ryan Roberts <ryan.roberts@....com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        "\"Matthew Wilcox (Oracle)\"" <willy@...radead.org>,
        David Hildenbrand <david@...hat.com>,
        "\"Yin, Fengwei\"" <fengwei.yin@...el.com>,
        Yu Zhao <yuzhao@...gle.com>, Vlastimil Babka <vbabka@...e.cz>,
        Johannes Weiner <hannes@...xchg.org>,
        Baolin Wang <baolin.wang@...ux.alibaba.com>,
        Kemeng Shi <shikemeng@...weicloud.com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Rohan Puri <rohan.puri15@...il.com>,
        Mcgrof Chamberlain <mcgrof@...nel.org>,
        Adam Manzanares <a.manzanares@...sung.com>,
        John Hubbard <jhubbard@...dia.com>
Subject: Re: [RFC PATCH 0/4] Enable >0 order folio memory compaction

On 9 Oct 2023, at 3:12, Huang, Ying wrote:

> Hi, Zi,
>
> Thanks for your patch!
>
> Zi Yan <zi.yan@...t.com> writes:
>
>> From: Zi Yan <ziy@...dia.com>
>>
>> Hi all,
>>
>> This patchset enables >0 order folio memory compaction, which is one of
>> the prerequisitions for large folio support[1]. It is on top of
>> mm-everything-2023-09-11-22-56.
>>
>> Overview
>> ===
>>
>> To support >0 order folio compaction, the patchset changes how free pages used
>> for migration are kept during compaction.
>
> migrate_pages() can split the large folio for allocation failure.  So
> the minimal implementation could be
>
> - allow to migrate large folios in compaction
> - return -ENOMEM for order > 0 in compaction_alloc()
>
> The performance may be not desirable.  But that may be a baseline for
> further optimization.

I would imagine it might cause a regression since compaction might gradually
split high order folios in the system. But I can move Patch 4 first to make this
the baseline and see how system performance changes.

>
> And, if we can measure the performance for each step of optimization,
> that will be even better.

Do you have any benchmark in mind for the performance tests? vm-scalability?

>
>> Free pages used to be split into
>> order-0 pages that are post allocation processed (i.e., PageBuddy flag cleared,
>> page order stored in page->private is zeroed, and page reference is set to 1).
>> Now all free pages are kept in a MAX_ORDER+1 array of page lists based
>> on their order without post allocation process. When migrate_pages() asks for
>> a new page, one of the free pages, based on the requested page order, is
>> then processed and given out.
>>
>>
>> Optimizations
>> ===
>>
>> 1. Free page split is added to increase migration success rate in case
>> a source page does not have a matched free page in the free page lists.
>> Free page merge is possible but not implemented, since existing
>> PFN-based buddy page merge algorithm requires the identification of
>> buddy pages, but free pages kept for memory compaction cannot have
>> PageBuddy set to avoid confusing other PFN scanners.
>>
>> 2. Sort source pages in ascending order before migration is added to
>
> Trivial.
>
> s/ascending/descending/
>
>> reduce free page split. Otherwise, high order free pages might be
>> prematurely split, causing undesired high order folio migration failures.
>>
>>
>> TODOs
>> ===
>>
>> 1. Refactor free page post allocation and free page preparation code so
>> that compaction_alloc() and compaction_free() can call functions instead
>> of hard coding.
>>
>> 2. One possible optimization is to allow migrate_pages() to continue
>> even if get_new_folio() returns a NULL. In general, that means there is
>> not enough memory. But in >0 order folio compaction case, that means
>> there is no suitable free page at source page order. It might be better
>> to skip that page and finish the rest of migration to achieve a better
>> compaction result.
>
> We can split the source folio if get_new_folio() returns NULL.  So, do
> we really need this?

It depends. The situation it can benefit is that when the system is going
to allocate a high order free page and trigger a compaction, it is possible to
get the high order free page by migrating a bunch of base pages instead of
splitting a existing high order folio.

>
> In general, we may reconsider all further optimizations given splitting
> is available already.

In my mind, split should be avoided as much as possible. But it really depends
on the actual situation, e.g., how much effort and cost the compaction wants
to pay to get memory defragmented. If the system really wants to get a high
order free page at any cost, split can be used without any issue. But applications
might lose performance because existing large folios are split just to a
new one.

Like I said in the email, there are tons of optimizations and policies for us
to explore. We can start with the bare minimum support (if no performance
regression is observed, we can even start with split all high folios like you
suggested) and add optimizations one by one.

>
>> 3. Another possible optimization is to enable free page merge. It is
>> possible that a to-be-migrated page causes free page split then fails to
>> migrate eventually. We would lose a high order free page without free
>> page merge function. But a way of identifying free pages for memory
>> compaction is needed to reuse existing PFN-based buddy page merge.
>>
>> 4. The implemented >0 order folio compaction algorithm is quite naive
>> and does not consider all possible situations. A better algorithm can
>> improve compaction success rate.
>>
>>
>> Feel free to give comments and ask questions.
>>
>> Thanks.
>>
>>
>> [1] https://lore.kernel.org/linux-mm/f8d47176-03a8-99bf-a813-b5942830fd73@arm.com/
>>
>> Zi Yan (4):
>>   mm/compaction: add support for >0 order folio memory compaction.
>>   mm/compaction: optimize >0 order folio compaction with free page
>>     split.
>>   mm/compaction: optimize >0 order folio compaction by sorting source
>>     pages.
>>   mm/compaction: enable compacting >0 order folios.
>>
>>  mm/compaction.c | 205 +++++++++++++++++++++++++++++++++++++++---------
>>  mm/internal.h   |   7 +-
>>  2 files changed, 176 insertions(+), 36 deletions(-)
>
> --
> Best Regards,
> Huang, Ying


--
Best Regards,
Yan, Zi

Download attachment "signature.asc" of type "application/pgp-signature" (855 bytes)