linux-kernel - Re: [PATCH v6 0/7] Buddy allocator like (or non-uniform) folio split

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Z6YX3RznGLUD07Ao@casper.infradead.org>
Date: Fri, 7 Feb 2025 14:25:33 +0000
From: Matthew Wilcox <willy@...radead.org>
To: Zi Yan <ziy@...dia.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
	"Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
	Ryan Roberts <ryan.roberts@....com>,
	Hugh Dickins <hughd@...gle.com>,
	David Hildenbrand <david@...hat.com>,
	Yang Shi <yang@...amperecomputing.com>,
	Miaohe Lin <linmiaohe@...wei.com>,
	Kefeng Wang <wangkefeng.wang@...wei.com>,
	Yu Zhao <yuzhao@...gle.com>, John Hubbard <jhubbard@...dia.com>,
	Baolin Wang <baolin.wang@...ux.alibaba.com>,
	linux-kselftest@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v6 0/7] Buddy allocator like (or non-uniform) folio split

On Fri, Feb 07, 2025 at 09:11:39AM -0500, Zi Yan wrote:
> Existing uniform split requires 2^(order % XA_CHUNK_SHIFT) xa_node allocations
> during split, when the folio needs to be split to order-0. But non-uniform split
> only requires at most 1 xa_node allocation. For example, to split an order-9
> folio, 8 xa_nodes are needed for uniform split, since the folio takes 8
> multi-index slots in the xarray. But for non-uniform split, only the slot
> containing the given struct page needs a xa_node after the split. There will be
> a 7 xa_node saving.
> 
> Hi Matthew,
> 
> Do you mind checking my statement above on xarray memory saving? And correct me
> if I miss anything. Thanks.

We currently have a bug where we can't split order-12 (or above) to order-0 (or anything in the range 0-5) as we'd need to allocate two layers of nodes, and
the preallocation can't do that.

As part of your series, I'd like to remove that limitation, so we'd need
to allocate log_64(n - m) [ok, more complex than that, but ykwim].  So
it's not quite "only allocate one node", but it's allocate O(log(current
number of nodes needed to be allocated)).

Makes sense?