[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20260120062706.91078-1-lizhe.67@bytedance.com>
Date: Tue, 20 Jan 2026 14:27:06 +0800
From: "Li Zhe" <lizhe.67@...edance.com>
To: <david@...nel.org>
Cc: <akpm@...ux-foundation.org>, <dan.j.williams@...el.com>,
<dave@...olabs.net>, <ankur.a.arora@...cle.com>, <fvdl@...gle.com>,
<gourry@...rry.net>, <joao.m.martins@...cle.com>,
<jonathan.cameron@...wei.com>, <linux-cxl@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <linux-mm@...ck.org>,
<lizhe.67@...edance.com>, <mhocko@...e.com>, <mjguzik@...il.com>,
<muchun.song@...ux.dev>, <osalvador@...e.de>, <raghavendra.kt@....com>,
<wangzhou1@...ilicon.com>, <zhanjie9@...ilicon.com>
Subject: Re: [PATCH v2 0/8] Introduce a huge-page pre-zeroing mechanism
In light of the preceding discussion, we appear to have reached the
following understanding:
(1) At present we prefer to mitigate slow application startup (e.g.,
VM creation) by zeroing huge pages at the moment they are freed
(init_on_free). The principal benefit is that user space gains the
performance improvement without deploying any additional user space
daemon.
(2) Deferring the zeroing from allocation to release may occasionally
cause the thread that frees the page to differ from the one that
originally allocates it, so the clearing cost is not charged to the
allocating thread. Because this situation is rare and the existing
init_on_free mechanism in the kernel already exhibits the same
behavior, we deem the consequence acceptable.
(3) The function __unmap_hugepage_range() employs the MMU-gather
mechanism, which refrains from dropping the page reference while
holding the PTL (spinlock). This allows huge-page zeroing to be
performed in a non-atomic context.
(4) Given that, in the vast majority of cases, the same thread that
allocates a huge page also frees it, and the exceptions highlighted
by David are genuinely rare[1]. We can achieve faster application
startup by implementing an init_on_free-style mechanism.
(5) Going forward we can further optimize the zeroing process by
leveraging a DMA engine.
If the foregoing is accurate, I propose we add a new hugetlbfs mount
option to achieve the init-on-free behavior.
Thanks,
Zhe
[1]: https://lore.kernel.org/all/83798495-915b-4a5d-9638-f5b3de913b71@kernel.org/#t
Powered by blists - more mailing lists