[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <019EB6CA-0F4B-496A-B2AE-A3A553585281@nvidia.com>
Date: Fri, 07 Feb 2025 09:11:39 -0500
From: Zi Yan <ziy@...dia.com>
To: Andrew Morton <akpm@...ux-foundation.org>,
"Matthew Wilcox (Oracle)" <willy@...radead.org>
Cc: linux-mm@...ck.org,
"Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
Ryan Roberts <ryan.roberts@....com>, Hugh Dickins <hughd@...gle.com>,
David Hildenbrand <david@...hat.com>, Yang Shi <yang@...amperecomputing.com>,
Miaohe Lin <linmiaohe@...wei.com>, Kefeng Wang <wangkefeng.wang@...wei.com>,
Yu Zhao <yuzhao@...gle.com>, John Hubbard <jhubbard@...dia.com>,
Baolin Wang <baolin.wang@...ux.alibaba.com>, linux-kselftest@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v6 0/7] Buddy allocator like (or non-uniform) folio split
On 6 Feb 2025, at 3:01, Andrew Morton wrote:
> On Tue, 4 Feb 2025 22:14:10 -0500 Zi Yan <ziy@...dia.com> wrote:
>
>> This patchset adds a new buddy allocator like (or non-uniform) large folio
>> split to reduce the total number of after-split folios, the amount of memory
>> needed for multi-index xarray split, and keep more large folios after a split.
>
> It would be useful (vital, really) to provide some measurements which
> help others understand the magnitude of these resource savings, please.
Hi Andrew,
Can you please drop this series for now? I find that, after your above request,
I misunderstood how xas_split_alloc() and xas_split() works in xarray, thus,
my current implementation allocates more than enough xa_node during non-uniform
split, although the excessive ones are freed at the end. It defeats the purpose
of reducing memory consumption of multi-index xarray split, even if
folio_split() has no function issue AFAICT. I am working on a better
implementation that might require new xarray operations. I will post it as v7
later. I really appreciate that you asked about more info above. :)
More details on memory saving for multi-index xarray split during non-uniform
split compared to existing uniform split (I will add this to commit log in the
next version):
Existing uniform split requires 2^(order % XA_CHUNK_SHIFT) xa_node allocations
during split, when the folio needs to be split to order-0. But non-uniform split
only requires at most 1 xa_node allocation. For example, to split an order-9
folio, 8 xa_nodes are needed for uniform split, since the folio takes 8
multi-index slots in the xarray. But for non-uniform split, only the slot
containing the given struct page needs a xa_node after the split. There will be
a 7 xa_node saving.
Hi Matthew,
Do you mind checking my statement above on xarray memory saving? And correct me
if I miss anything. Thanks.
Best Regards,
Yan, Zi
Powered by blists - more mailing lists