linux-kernel - Re: [PATCH v2 0/8] Introduce a huge-page pre-zeroing mechanism

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87wm1ih5kb.fsf@oracle.com>
Date: Thu, 15 Jan 2026 14:30:44 -0800
From: Ankur Arora <ankur.a.arora@...cle.com>
To: "David Hildenbrand (Red Hat)" <david@...nel.org>
Cc: dan.j.williams@...el.com, Jonathan Cameron
 <jonathan.cameron@...wei.com>,
        Li Zhe <lizhe.67@...edance.com>, akpm@...ux-foundation.org,
        ankur.a.arora@...cle.com, fvdl@...gle.com, joao.m.martins@...cle.com,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org, mhocko@...e.com,
        mjguzik@...il.com, muchun.song@...ux.dev, osalvador@...e.de,
        raghavendra.kt@....com, linux-cxl@...r.kernel.org,
        Davidlohr Bueso <dave@...olabs.net>,
        Gregory
 Price <gourry@...rry.net>, zhanjie9@...ilicon.com,
        wangzhou1@...ilicon.com
Subject: Re: [PATCH v2 0/8] Introduce a huge-page pre-zeroing mechanism


David Hildenbrand (Red Hat) <david@...nel.org> writes:

> On 1/15/26 21:16, dan.j.williams@...el.com wrote:
>> David Hildenbrand (Red Hat) wrote:
>> [..]
>>>> Give me a list of 1Gig pages and this stuff becomes much more efficient
>>>> than anything the CPU can do.
>>>
>>> Right, and ideally we'd implement any such mechanisms in a way that more
>>> parts of the kernel can benefit, and not just an unloved in-memory
>>> file-system that most people just want to get rid of as soon as we can :)
>> CPUs have tended to eat the value of simple DMA offload operations like
>> copy/zero over time.
>> In the case of this patch there is no async-offload benefit because
>> userspace is already charged with spawning more threads if it wants more
>> parallelism.
>
> In this subthread we're discussing handling that in the kernel like
> init_on_free. So when user space frees a hugetlb folio (or in the 
> future, other similarly gigantic folios from another allocator), we'd be zeroing
> it.
>
> If it would be freeing multiple such folios, we could pack them and send them to
> a DMA engine to zero them for us (concurrently? asynchronously? I don't know :)
> )

I've been thinking about using non-temporal instructions (movnt/clzero)
for zeroing in that path.

Both the DMA engine and non-temporal zeroing would also improve things
because we won't be bringing free buffers to the cache while zeroing.

-- 
ankur