[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <7f89c0cf-0601-4f61-a665-72364629681a@kernel.org>
Date: Tue, 18 Nov 2025 09:51:39 +0100
From: "David Hildenbrand (Red Hat)" <david@...nel.org>
To: Ackerley Tng <ackerleytng@...gle.com>,
Matthew Wilcox <willy@...radead.org>
Cc: akpm@...ux-foundation.org, linux-fsdevel@...r.kernel.org,
linux-mm@...ck.org, linux-kernel@...r.kernel.org, michael.roth@....com,
vannapurve@...gle.com
Subject: Re: [RFC PATCH 0/4] Extend xas_split* to support splitting
arbitrarily large entries
On 18.11.25 00:43, Ackerley Tng wrote:
> Matthew Wilcox <willy@...radead.org> writes:
>
>> On Mon, Nov 17, 2025 at 02:46:57PM -0800, Ackerley Tng wrote:
>>> guest_memfd is planning to store huge pages in the filemap, and
>>> guest_memfd's use of huge pages involves splitting of huge pages into
>>> individual pages. Splitting of huge pages also involves splitting of
>>> the filemap entries for the pages being split.
>
>>
>> Hm, I'm not most concerned about the number of nodes you're allocating.
>
> Thanks for reminding me, I left this out of the original message.
>
> Splitting the xarray entry for a 1G folio (in a shift-18 node for
> order=18 on x86), assuming XA_CHUNK_SHIFT is 6, would involve
>
> + shift-18 node (the original node will be reused - no new allocations)
> + shift-12 node: 1 node allocated
> + shift-6 node : 64 nodes allocated
> + shift-0 node : 64 * 64 = 4096 nodes allocated
>
> This brings the total number of allocated nodes to 4161 nodes. struct
> xa_node is 576 bytes, so that's 2396736 bytes or 2.28 MB, so splitting a
> 1G folio to 4K pages costs ~2.5 MB just in filemap (XArray) entry
> splitting. The other large memory cost would be from undoing HVO for the
> HugeTLB folio.
>
>> I'm most concerned that, once we have memdescs, splitting a 1GB page
>> into 512 * 512 4kB pages is going to involve allocating about 20MB
>> of memory (80 bytes * 512 * 512).
>
> I definitely need to catch up on memdescs. What's the best place for me
> to learn/get an overview of how memdescs will describe memory/replace
> struct folios?
>
> I think there might be a better way to solve the original problem of
> usage tracking with memdesc support, but this was intended to make
> progress before memdescs.
>
>> Is this necessary to do all at once?
>
> The plan for guest_memfd was to first split from 1G to 4K, then optimize
> on that by splitting in stages, from 1G to 2M as much as possible, then
> to 4K only for the page ranges that the guest shared with the host.
Right, we also discussed the non-uniform split as an optimization in the
future.
--
Cheers
David
Powered by blists - more mailing lists