[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <86qzsd7zmu.fsf@kernel.org>
Date: Mon, 29 Dec 2025 22:21:29 +0100
From: Pratyush Yadav <pratyush@...nel.org>
To: Pasha Tatashin <pasha.tatashin@...een.com>
Cc: Pratyush Yadav <pratyush@...nel.org>, Mike Rapoport <rppt@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>, David Hildenbrand
<david@...nel.org>, Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, "Liam
R. Howlett" <Liam.Howlett@...cle.com>, Vlastimil Babka <vbabka@...e.cz>,
Suren Baghdasaryan <surenb@...gle.com>, Michal Hocko <mhocko@...e.com>,
Jonathan Corbet <corbet@....net>, Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>, Dave
Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org, "H. Peter Anvin"
<hpa@...or.com>, Muchun Song <muchun.song@...ux.dev>, Oscar Salvador
<osalvador@...e.de>, Alexander Graf <graf@...zon.com>, David Matlack
<dmatlack@...gle.com>, David Rientjes <rientjes@...gle.com>, Jason
Gunthorpe <jgg@...dia.com>, Samiullah Khawaja <skhawaja@...gle.com>,
Vipin Sharma <vipinsh@...gle.com>, Zhu Yanjun <yanjun.zhu@...ux.dev>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
linux-doc@...r.kernel.org, kexec@...ts.infradead.org
Subject: Re: [RFC PATCH 06/10] liveupdate: hugetlb subsystem FLB state
preservation
On Tue, Dec 23 2025, Pasha Tatashin wrote:
> On Sat, Dec 6, 2025 at 6:03 PM Pratyush Yadav <pratyush@...nel.org> wrote:
>>
>> HugeTLB manages its own pages. It allocates them on boot and uses those
>> to fulfill hugepage requests.
>>
>> To support live update for a hugetlb-backed memfd, it is necessary to
>> track how many pages of each hstate are coming from live update. This is
>> needed to ensure the boot time allocations don't over-allocate huge
>> pages, causing the rest of the system unexpected memory pressure.
>>
>> For example, say the system has 100G memory and it uses 90 1G huge
>> pages, with 10G put aside for other processes. Now say 5 of those pages
>> are preserved via KHO for live updating a huge memfd.
>>
>> But during boot, the system will still see that it needs 90 huge pages,
>> so it will attempt to allocate those. When the file is later retrieved,
>> those 5 pages also get added to the huge page pool, resulting in 95
>> total huge pages. This exceeds the original expectation of 90 pages, and
>> ends up wasting memory.
>>
>> LUO has file-lifecycle-bound (FLB) data to keep track of global state of
>> a subsystem. Use it to track how many huge pages are used up for each
>> hstate. When a file is preserved, it will increment to the counter, and
>> when it is unpreserved, it will decrement it. During boot time
>> allocations, this data can be used to calculate how many hugepages
>> actually need to be allocated.
>>
>> Design note: another way of doing this would be to preserve the entire
>> set of hugepages using the FLB, skip boot time allocation, and restore
>> them all on FLB retrieve. The pain problem with that approach is that it
>> would need to freeze all hstates after serializing them. This will need
>> a lot more invasive changes in hugetlb since there are many ways folios
>> can be added to or removed from a hstate. Doing it this way is simpler
>> and less invasive.
>>
>> Signed-off-by: Pratyush Yadav <pratyush@...nel.org>
>> ---
>> Documentation/mm/memfd_preservation.rst | 9 ++
>> MAINTAINERS | 1 +
>> include/linux/kho/abi/hugetlb.h | 66 +++++++++
>> kernel/liveupdate/Kconfig | 12 ++
>> mm/Makefile | 1 +
>> mm/hugetlb.c | 1 +
>> mm/hugetlb_internal.h | 15 ++
>> mm/hugetlb_luo.c | 179 ++++++++++++++++++++++++
>> 8 files changed, 284 insertions(+)
>> create mode 100644 include/linux/kho/abi/hugetlb.h
>> create mode 100644 mm/hugetlb_luo.c
>>
[...]
>> +static int hugetlb_flb_retrieve(struct liveupdate_flb_op_args *args)
>> +{
>> + /*
>> + * The FLB is only needed for boot-time calculation of how many
>> + * hugepages are needed. This is done by early boot handlers already.
>> + * Free the serialized state now.
>> + */
>
> It should be done in this function.
The calculations can't be done in retrieve. Retrieve happens only once
and for the whole FLB. They will need to come from
hugetlb_hstate_alloc_pages().
Maybe you mean getting rid of liveupdate_flb_incoming_early()? Yeah,
that I can do. It will make this function a no-op once we move the
kho_restore_free() to finish().
>
>> + kho_restore_free(phys_to_virt(args->data));
>
> This should be moved to finish() after blackout.
Sure.
>
>> +
>> + /*
>> + * HACK: But since LUO FLB still needs an obj, use ZERO_SIZE_PTR to
>> + * satisfy it.
>> + */
>> + args->obj = ZERO_SIZE_PTR;
>
> Hopefully this is not needed any more with the updated FLB, please check :-)
Yep. IIRC when I sent this series the older version of FLB was in
mm-nonmm-unstable.
>
>> + return 0;
>> +}
>> +
[...]
--
Regards,
Pratyush Yadav
Powered by blists - more mailing lists