lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: 
 <176412196000.447063.4256335030026363827.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>
Date: Wed, 26 Nov 2025 02:08:46 +0000
From: Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com>
To: kys@...rosoft.com, haiyangz@...rosoft.com, wei.liu@...nel.org,
 decui@...rosoft.com
Cc: linux-hyperv@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: [PATCH v7 0/7] Introduce movable pages for Hyper-V guests

>From the start, the root-partition driver allocates, pins, and maps all
guest memory into the hypervisor at guest creation. This is simple: Linux
cannot move the pages, so the guest’s view in Linux and in Microsoft
Hypervisor never diverges.

However, this approach has major drawbacks:
 - NUMA: affinity can’t be changed at runtime, so you can’t migrate guest memory closer to the CPUs running it → performance hit.
 - Memory management: unused guest memory can’t be swapped out, compacted, or merged.
 - Provisioning time: upfront allocation/pinning slows guest create/destroy.
 - Overcommit: no memory overcommit on hosts with pinned-guest memory.

This series adds movable memory pages for Hyper-V child partitions. Guest
pages are no longer allocated upfront; they’re allocated and mapped into
the hypervisor on demand (i.e., when the guest touches a GFN that isn’t yet
backed by a host PFN).
When a page is moved, Linux no longer holds it and it is unmapped from the hypervisor.
As a result, Hyper-V guests behave like regular Linux processes, enabling standard Linux memory features to apply to guests.

Exceptions (still pinned):
 1. Encrypted guests (explicit).
 2. Guests with passthrough devices (implicitly pinned by the VFIO framework).

v7:
 - Only the first two patches remain unchanged from v6.
 - Introduced reference counting for memory regions to resolve a race
   condition between region servicing (faulting and invalidation) and region
   destruction.
 - Corrected the assumption that regions starting with a huge page contain
   only huge pages; the code now properly handles regions with mixed page
   size segments.
 - Consolidated region management logic into a dedicated file.
 - Updated the driver to select MMU_NOTIFIER, removing support for
   configurations without this option.
 - Cleaned up and refactored the region management code.
 - Fixed a build issue reported by the kernel test robot for configurations
   where HPAGE_PMD_NR is defined to result in build bug.
 - Replaced VALUE_PMD_ALIGNED with the generic IS_ALIGNED macro.
 - Simplified region flags by introducing a region type for clarity.
 - Improved commit messages.

v6:
 - Fix a bug in large page remapping where setting the large map flag based
   on the PFN offset's large page alignment within the region implicitly
   assumed that the region's start offset was also large page aligned,
   which could cause map hypercall failures.
 - Fix a bug in large page unmapping where setting the large unmap flag for
   an unaligned guest PFN range could result in unmap hypercall failures.

v5:
 - Fix a bug in MMU notifier handling where an uninitialized 'ret' variable
   could cause the warning about failed page invalidation to be skipped.
 - Improve comment grammar regarding skipping the unmapping of non-mapped pages.

v4:
 - Fix a bug in batch unmapping can skip mapped pages when selecting a new
   batch due to wrong offset calculation.
 - Fix an error message in case of failed memory region pinning.

v3:
 - Region is invalidated even if the mm has no users.
 - Page remapping logic is updated to support 2M-unaligned remappings for
   regions that are PMD-aligned, which can occur during both faults and
   invalidations.

v2:
 - Split unmap batching into a separate patch.
 - Fixed commit messages from v1 review.
 - Renamed a few functions for clarity.

---

Stanislav Kinsburskii (7):
      Drivers: hv: Refactor and rename memory region handling functions
      Drivers: hv: Centralize guest memory region destruction
      Drivers: hv: Move region management to mshv_regions.c
      Drivers: hv: Fix huge page handling in memory region traversal
      Drivers: hv: Improve region overlap detection in partition create
      Drivers: hv: Add refcount and locking to mem regions
      Drivers: hv: Add support for movable memory regions


 drivers/hv/Kconfig          |    2 
 drivers/hv/Makefile         |    2 
 drivers/hv/mshv_regions.c   |  548 +++++++++++++++++++++++++++++++++++++++++++
 drivers/hv/mshv_root.h      |   32 ++-
 drivers/hv/mshv_root_main.c |  382 +++++++++++++-----------------
 5 files changed, 745 insertions(+), 221 deletions(-)
 create mode 100644 drivers/hv/mshv_regions.c


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ