lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <7ace0c7b-6a3d-48c4-94ee-de5c91af02c7@kernel.org>
Date: Wed, 11 Feb 2026 21:22:05 +0100
From: "David Hildenbrand (Arm)" <david@...nel.org>
To: "Thomson, Jack" <jackabt.amazon@...il.com>, mst@...hat.com,
 jasowang@...hat.com
Cc: xuanzhuo@...ux.alibaba.com, eperezma@...hat.com,
 virtualization@...ts.linux.dev, linux-kernel@...r.kernel.org,
 kalyazin@...zon.co.uk, xmarcalx@...zon.co.uk, jackabt@...zon.com
Subject: Re: [RFC PATCH] virtio_balloon: Support wait on ACK for hinting

On 1/19/26 17:30, Thomson, Jack wrote:
> 
> 
> On 19/01/2026 3:50 pm, David Hildenbrand (Red Hat) wrote:
>> On 1/19/26 16:42, Jack Thomson wrote:
>>> From: Jack Thomson <jackabt@...zon.com>
>>>
>>> This RFC patch adds a new virtio feature for the virtio-balloon driver
>>> during free page hinting, which will wait on device ack before
>>> committing the range to the free_page_list. The reason for the change is
>>> it allows the device to modify this range without it being reclaimed
>>> from the free_page_list before the ack is sent. As expected, testing
>>> shows this adds overhead to the hinting run duration, increasing it by
>>> ~30% with our Firecracker setup. Currently free page hinting is used
>>> mainly for live migration, but this would open it up for a new use-case.
>>>
>>> We would like to leverage this with MADV_DONTNEED to reduce RSS of a
>>> guest. We'd like to use hinting because of the flexibility of control it
>>> brings compared to reporting, allowing memory to be reclaimed in
>>> deterministic periods. 
>>
>> Can you elaborate in more detail why you don't simply use reporting, 
>> like QEMU?
> 
> Ideally we'd like to use hinting as the API allows us to control when
> this reclamation takes place so as not to impact active VMs. For example
> if we know a VM is idle we can reclaim memory but also cancel the
> reclamation quickly if the VM receives new work (something we can't do
> quickly with the traditional balloon.)
> 
>> Could you instead see optimizations being done to reporting that could 
>> make it fly for your use case?
> 
> One thing that I considered was having reporting running but skip
> reported ranges during active times. But this may lead to missing
> reclamation opportunities.

We could implement a pause+continue option for free-page-reporting 
option. So the device could tell the VM to pause reporting and later to 
restart reporting.

> 
>>
>> Hinting is a rather special case thing only used for reducing VM 
>> migration time in QEMU AFAIR.
>>
> 
> Yeah, its API allowing direct control was what interested us. With this
> extension it made a great pairing just needed the synchronisation to
> make it safe.

Sorry for getting back to you only now.

So, right now the thing is that hinted pages can get reused by the VM 
any time. The hypervisor must detect if that happened and not discard 
the pages in that case.

While that works for live migration with bitmap dirty tracking (and is a 
bit confusing ...), it doesn't work when you want to MADV_DONTNEED that 
memory, because it could be the hypervisor issues MADV_DONTNEED just 
after the VM reused the memory.

So you are proposing to let the VM wait for the ack before possibly 
reusing the pages.



One thing to note is that free page hinting in Linux allocates memory 
through

	alloc_pages(VIRTIO_BALLOON_FREE_PAGE_ALLOC_FLAG,
		    VIRTIO_BALLOON_HINT_BLOCK_ORDER);

Meaning

a) Limited to MAX_ORDER chunks (e.g., 4MB on x86). This is even bigger
    than the free-page-reporting granularity (pageblock order, 2MB on
    x86)
b) Cannot free memory on ZONE_MOVABLE or CMA in the VM (as these are
    unmovable allocations)

So it's a bit suboptimal.


Also, in contrast to free-page-reporting, these pages are not going to 
get reused unless we run into the shrinker, which is a bit suboptimal as 
well. Free-page-reporting is a lot more optimized for that, as it just 
returns reported pages back to the buddy immediately.


So if possible, I would suggest instead to extend free-page-reporting.

-- 
Cheers,

David

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ