linux-kernel - Re: [PATCH 0/3] Introduce movable pages for Hyper-V guests

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <aN1ZNNe66f0SA-Cs@kernel.org>
Date: Wed, 1 Oct 2025 18:39:16 +0200
From: Mike Rapoport <rppt@...nel.org>
To: Wei Liu <wei.liu@...nel.org>
Cc: Mukesh R <mrathor@...ux.microsoft.com>,
	Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com>,
	kys@...rosoft.com, haiyangz@...rosoft.com, decui@...rosoft.com,
	linux-hyperv@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/3] Introduce movable pages for Hyper-V guests

On Wed, Oct 01, 2025 at 04:18:30AM +0000, Wei Liu wrote:
> +Mike Rapoport, our resident memory management expert.
> 
> On Fri, Sep 26, 2025 at 07:02:02PM -0700, Mukesh R wrote:
> > On 9/24/25 14:30, Stanislav Kinsburskii wrote:
> > >>From the start, the root-partition driver allocates, pins, and maps all
> > > guest memory into the hypervisor at guest creation. This is simple: Linux
> > > cannot move the pages, so the guest?s view in Linux and in Microsoft
> > > Hypervisor never diverges.
> > > 
> > > However, this approach has major drawbacks:
> > > - NUMA: affinity can?t be changed at runtime, so you can?t migrate guest memory closer to the CPUs running it ? performance hit.
> > > - Memory management: unused guest memory can?t be swapped out, compacted, or merged.
> > > - Provisioning time: upfront allocation/pinning slows guest create/destroy.
> > > - Overcommit: no memory overcommit on hosts with pinned-guest memory.
> > > 
> > > This series adds movable memory pages for Hyper-V child partitions. Guest
> > > pages are no longer allocated upfront; they?re allocated and mapped into
> > > the hypervisor on demand (i.e., when the guest touches a GFN that isn?t yet
> > > backed by a host PFN).
> > > When a page is moved, Linux no longer holds it and it is unmapped from the hypervisor.
> > > As a result, Hyper-V guests behave like regular Linux processes, enabling standard Linux memory features to apply to guests.
> > > 
> > > Exceptions (still pinned):
> > >   1. Encrypted guests (explicit).
> > >   2 Guests with passthrough devices (implicitly pinned by the VFIO framework).
> > 
> > 
> > As I had commented internally, I am not fully comfortable about the
> > approach here, specially around use of HMM, and the correctness of
> > locking for shared memory regions, but my knowledge is from 4.15 and
> > maybe outdated, and don't have time right now. So I won't object to it
> > if other hard core mmu developers think there are no issues.
> > 
> 
> Mike, I seem to remember you had a discussion with Stanislav about this?
> Can you confirm that this is a reasonable approach?
>
> Better yet, if you have time to review the code, that would be great.
> Note that there is a v2 on linux-hyperv.  But I would like to close
> Mukesh's question first.

I only had time to skip through the patches and yes, this is a reasonable
approach. I also confirmed privately with HMM maintainer a while ago that
the use of HMM and MMU notifiers is correct.

I don't know enough about mshv to see if there are corner cases that these
patches don't cover, but conceptually they make memory model follow KVM
best practices.
 
> > However, we won't be using this for minkernel, so would like a driver
> > boot option to disable it upon boot that we can just set in minkernel
> > init path. This option can also be used to disable it if problems are
> > observed on the field. Minkernel design is still being worked on, so I
> > cannot provide much details on it yet.

The usual way we do things in the kernel is to add functionality when it
has users, so a boot option can be added later when minkernel design will
be more mature and ready for upstream.

-- 
Sincerely yours,
Mike.