[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d1f224fb-c0fb-47f9-bea8-3c33137be161@oracle.com>
Date: Tue, 3 Dec 2024 14:26:04 +0000
From: Joao Martins <joao.m.martins@...cle.com>
To: Michal Hocko <mhocko@...e.com>, Frank van der Linden <fvdl@...gle.com>
Cc: Mateusz Guzik <mjguzik@...il.com>, linux-mm@...ck.org,
akpm@...ux-foundation.org, Muchun Song <muchun.song@...ux.dev>,
Miaohe Lin <linmiaohe@...wei.com>, Oscar Salvador <osalvador@...e.de>,
David Hildenbrand <david@...hat.com>, Peter Xu <peterx@...hat.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm/hugetlb: optionally pre-zero hugetlb pages
On 03/12/2024 12:06, Michal Hocko wrote:
> On Mon 02-12-24 14:50:49, Frank van der Linden wrote:
>> On Mon, Dec 2, 2024 at 1:58 PM Mateusz Guzik <mjguzik@...il.com> wrote:
>>> Any games with "background zeroing" are notoriously crappy and I would
>>> argue one should exhaust other avenues before going there -- at the end
>>> of the day the cost of zeroing will have to get paid.
>>
>> I understand that the concept of background prezeroing has been, and
>> will be, met with some resistance. But, do you have any specific
>> concerns with the patch I posted? It's pretty well isolated from the
>> rest of the code, and optional.
>
> The biggest concern I have is that the overhead is payed by everybody on
> the system - it is considered to be a system overhead regardless only
> part of the workload benefits from hugetlb pages. In other words the
> workload using those pages is not accounted for the use completely.
>
> If the startup latency is a real problem is there a way to workaround
> that in the userspace by preallocating hugetlb pages ahead of time
> before those VMs are launched and hand over already pre-allocated pages?
It should be relatively simple to actually do this. Me and Mike had experimented
ourselves a couple years back but we never had the chance to send it over. IIRC
if we:
- add the PageZeroed tracking bit when a page is zeroed
- clear it in the write (fixup/non-fixup) fault-path
[somewhat similar to this series I suspect]
Then what's left is to change the lookup of free hugetlb pages
(dequeue_hugetlb_folio_node_exact() I think) to search first for non-zeroed
pages. Provided we don't track its 'cleared' state, there's no UAPI change in
behaviour. A daemon can just allocate/mmap+touch/etc them with read-only and
free them back 'as zeroed' to implement a userspace scrubber. And in principle
existing apps should see no difference. The amount of changes is consequently
significantly smaller (or it looked as such in a quick PoC years back).
Something extra on the top would perhaps be the ability so select a lookup
heuristic such that we can pick the search method of
non-zero-first/only-nonzero/zeroed pages behind ioctl() (or a better generic
UAPI) to allow a scrubber to easily coexist with hugepage user (e.g. a VMM, etc)
without too much of a dance.
Joao
Powered by blists - more mailing lists