[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20260113064155.29900-1-lizhe.67@bytedance.com>
Date: Tue, 13 Jan 2026 14:41:54 +0800
From: "Li Zhe" <lizhe.67@...edance.com>
To: <ankur.a.arora@...cle.com>
Cc: <akpm@...ux-foundation.org>, <david@...nel.org>, <fvdl@...gle.com>,
<linux-kernel@...r.kernel.org>, <linux-mm@...ck.org>,
<lizhe.67@...edance.com>, <muchun.song@...ux.dev>, <osalvador@...e.de>
Subject: Re: [PATCH v2 0/8] Introduce a huge-page pre-zeroing mechanism
On Mon, 12 Jan 2026 14:01:29 -0800, ankur.a.arora@...cle.com wrote:
> > In user space, we can use system calls such as epoll and write to zero
> > huge folios as they become available, and sleep when none are ready. The
> > following pseudocode illustrates this approach. The pseudocode spawns
> > eight threads (each running thread_fun()) that wait for huge pages on
> > node 0 to become eligible for zeroing; whenever such pages are available,
> > the threads clear them in parallel.
> >
> > static void thread_fun(void)
> > {
> > epoll_create();
> > epoll_ctl();
> > while (1) {
> > val = read("/sys/devices/system/node/node0/hugepages/hugepages-1048576kB/zeroable_hugepages");
> > if (val > 0)
> > system("echo max > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/zeroable_hugepages");
> > epoll_wait();
> > }
> > }
>
> Given that zeroable_hugepages is per node, anybody who writes to
> it would need to know how much the aggregate demand would be.
>
> Seems to me that the only value that might make sense would be "max".
> And at that point this approach seems a little bit like init_on_free.
Yes, writing “max” suffices for the vast majority of workloads.
However, once multiple mutually independent application processes each
need huge pages, the ability to specify an exact value becomes
essential, because the CPU time each process spends on zeroing can
then be charged to its own cgroup. If we currently considers “max”
sufficient, we can implement support for that parameter alone and
extend it later when necessary.
Although “max” resembles init_on_free at first glance, it leaves the
decision of “when and on which CPU to zero” entirely to user space,
thereby eliminating the concern previously raised.
Thanks,
Zhe
Powered by blists - more mailing lists