linux-kernel - Re: [PATCH v2 0/8] Introduce a huge-page pre-zeroing mechanism

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20260115093641.44404-1-lizhe.67@bytedance.com>
Date: Thu, 15 Jan 2026 17:36:41 +0800
From: "Li Zhe" <lizhe.67@...edance.com>
To: <david@...nel.org>
Cc: <akpm@...ux-foundation.org>, <ankur.a.arora@...cle.com>, 
	<fvdl@...gle.com>, <joao.m.martins@...cle.com>, 
	<linux-kernel@...r.kernel.org>, <linux-mm@...ck.org>, 
	<lizhe.67@...edance.com>, <mhocko@...e.com>, <mjguzik@...il.com>, 
	<muchun.song@...ux.dev>, <osalvador@...e.de>, <raghavendra.kt@....com>
Subject: Re: [PATCH v2 0/8] Introduce a huge-page pre-zeroing mechanism

On Wed, 14 Jan 2026 18:21:08 +0100, david@...nel.org wrote:
  
> >> But again, I think the main motivation here is "increase application
> >> startup", not optimize that the zeroing happens at specific points in
> >> time during system operation (e.g., when idle etc).
> >>
> > 
> > Framing this as "increase application startup" and merely shifting the
> > overhead to shutdown seems like gaming the problem statement to me.
> > The real problem is total real time spent on it while pages are
> > needed.
> > 
> > Support for background zeroing can give you more usable pages provided
> > it has the cpu + ram to do it. If it does not, you are in the worst
> > case in the same spot as with zeroing on free.
> > 
> > Let's take a look at some examples.
> > 
> > Say there are no free huge pages and you kill a vm + start a new one.
> > On top of that all CPUs are pegged as is. In this case total time is
> > the same for "zero on free" as it is for background zeroing.
> 
> Right. If the pages get freed to immediately get allocated again, it 
> doesn't really matter who does the freeing. There might be some details, 
> of course.
> 
> > 
> > Say the system is freshly booted and you start up a vm. There are no
> > pre-zeroed pages available so it suffers at start time no matter what.
> > However, with some support for background zeroing, the machinery could
> > respond to demand and do it in parallel in some capacity, shortening
> > the real time needed.
> 
> Just like for init_on_free, I would start with zeroing these pages 
> during boot.
> 
> init_on_free assures that all pages in the buddy were zeroed out. Which 
> greatly simplifies the implementation, because there is no need to track 
> what was initialized and what was not.
> 
> It's a good question if initialization during that should be done in 
> parallel, possibly asynchronously during boot. Reminds me a bit of 
> deferred page initialization during boot. But that is rather an 
> extension that could be added somewhat transparently on top later.
> 
> If ever required we could dynamically enable this setting for a running 
> system. Whoever would enable it (flips the magic toggle) would zero out 
> all hugetlb pages that are already in the hugetlb allocator as free, but 
> not initialized yet.
> 
> But again, these are extensions on top of the basic design of having all 
> free hugetlb folios be zeroed.
> 
> > 
> > Say a little bit of real time passes and you start another vm. With
> > merely zeroing on free there are still no pre-zeroed pages available
> > so it again suffers the overhead. With background zeroing some of the
> > that memory would be already sorted out, speeding up said startup.
> 
> The moment they end up in the hugetlb allocator as free folios they 
> would have to get initialized.
> 
> Now, I am sure there are downsides to this approach (how to speedup 
> process exit by parallelizing zeroing, if ever required)? But it sounds 
> like being a bit ... simpler without user space changes required. In 
> theory :)

I strongly agree that init_on_free strategy effectively eliminates the
latency incurred during VM creation. However, it appears to introduce
two new issues.

First, the process that later allocates a page may not be the one that
freed it, raising the question of which process should bear the cost
of zeroing.

Second, put_page() is executed atomically, making it inappropriate to
invoke clear_page() within that context; off-loading the zeroing to a
workqueue merely reopens the same accounting problem.

Do you have any recommendations regarding these issues?

Thanks,
Zhe