[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aW_G66HeWLbyiPHs@gourry-fedora-PF4VCD3F>
Date: Tue, 20 Jan 2026 13:18:19 -0500
From: Gregory Price <gourry@...rry.net>
To: Li Zhe <lizhe.67@...edance.com>
Cc: david.laight.linux@...il.com, akpm@...ux-foundation.org,
ankur.a.arora@...cle.com, dan.j.williams@...el.com,
dave@...olabs.net, david@...nel.org, fvdl@...gle.com,
joao.m.martins@...cle.com, jonathan.cameron@...wei.com,
linux-cxl@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, mhocko@...e.com, mjguzik@...il.com,
muchun.song@...ux.dev, osalvador@...e.de, raghavendra.kt@....com,
wangzhou1@...ilicon.com, zhanjie9@...ilicon.com
Subject: Re: [PATCH v2 0/8] Introduce a huge-page pre-zeroing mechanism
On Tue, Jan 20, 2026 at 06:39:48PM +0800, Li Zhe wrote:
> On Tue, 20 Jan 2026 09:47:44 +0000, david.laight.linux@...il.com wrote:
>
> > On Tue, 20 Jan 2026 14:27:06 +0800
> > "Li Zhe" <lizhe.67@...edance.com> wrote:
> >
> > > In light of the preceding discussion, we appear to have reached the
> > > following understanding:
> > >
> > > (1) At present we prefer to mitigate slow application startup (e.g.,
> > > VM creation) by zeroing huge pages at the moment they are freed
> > > (init_on_free). The principal benefit is that user space gains the
> > > performance improvement without deploying any additional user space
> > > daemon.
> >
> > Am I missing something?
> > If userspace does:
> > $ program_a; program_b
> > and pages used by program_a are zeroed when it exits you get the delay
> > for zeroing all the pages it used before program_b starts.
> > OTOH if the zeroing is deferred program_b only needs to zero the pages
> > it needs to start (and there may be some lurking).
>
> Under the init_on-free approach, improving the speed of zeroing may
> indeed prove necessary.
>
> However, I believe we should first reach consensus on adopting
> “init_on_free” as the solution to slow application startup before
> turning to performance tuning.
>
His point was init_on_free may not actually reduce any delays on serial
applications, and can actually introduce additional delays.
Example
-------
program_a: alloc_hugepages(10);
exit();
program b: alloc_hugepages(5);
exit();
/* Run programs in serial */
sh: program_a && program_b
in zero_on_alloc():
program_a eats zero(10) cost on startup
program_b eats zero(5) cost on startup
Overall zero(15) cost to start program_b
in zero_on_free()
program_a eats zero(10) cost on startup
program_a eats zero(10) cost on exit
program_b eats zero(0) cost on startup
Overall zero(20) cost to start program_b
zero_on_free is worse by zero(5)
-------
This is a trivial example, but it's unclear zero_on_free actually
provides a benefit. You have to know ahead of time what the runtime
behavior, pre-zeroed count, and allocation pattern (0->10->5->...) would
be to determine whether there's an actual reduction in startup time.
But just trivially, starting from the base case of no pages being
zeroed, you're just injecting an additional zero(X) cost if program_a()
consumes more hugepages than program_b().
Long way of saying the shift from alloc to free seems heuristic-y and
you need stronger analysis / better data to show this change is actually
beneficial in the general case.
~Gregory
Powered by blists - more mailing lists