[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID:
<176412196000.447063.4256335030026363827.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>
Date: Wed, 26 Nov 2025 02:08:46 +0000
From: Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com>
To: kys@...rosoft.com, haiyangz@...rosoft.com, wei.liu@...nel.org,
decui@...rosoft.com
Cc: linux-hyperv@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: [PATCH v7 0/7] Introduce movable pages for Hyper-V guests
>From the start, the root-partition driver allocates, pins, and maps all
guest memory into the hypervisor at guest creation. This is simple: Linux
cannot move the pages, so the guest’s view in Linux and in Microsoft
Hypervisor never diverges.
However, this approach has major drawbacks:
- NUMA: affinity can’t be changed at runtime, so you can’t migrate guest memory closer to the CPUs running it → performance hit.
- Memory management: unused guest memory can’t be swapped out, compacted, or merged.
- Provisioning time: upfront allocation/pinning slows guest create/destroy.
- Overcommit: no memory overcommit on hosts with pinned-guest memory.
This series adds movable memory pages for Hyper-V child partitions. Guest
pages are no longer allocated upfront; they’re allocated and mapped into
the hypervisor on demand (i.e., when the guest touches a GFN that isn’t yet
backed by a host PFN).
When a page is moved, Linux no longer holds it and it is unmapped from the hypervisor.
As a result, Hyper-V guests behave like regular Linux processes, enabling standard Linux memory features to apply to guests.
Exceptions (still pinned):
1. Encrypted guests (explicit).
2. Guests with passthrough devices (implicitly pinned by the VFIO framework).
v7:
- Only the first two patches remain unchanged from v6.
- Introduced reference counting for memory regions to resolve a race
condition between region servicing (faulting and invalidation) and region
destruction.
- Corrected the assumption that regions starting with a huge page contain
only huge pages; the code now properly handles regions with mixed page
size segments.
- Consolidated region management logic into a dedicated file.
- Updated the driver to select MMU_NOTIFIER, removing support for
configurations without this option.
- Cleaned up and refactored the region management code.
- Fixed a build issue reported by the kernel test robot for configurations
where HPAGE_PMD_NR is defined to result in build bug.
- Replaced VALUE_PMD_ALIGNED with the generic IS_ALIGNED macro.
- Simplified region flags by introducing a region type for clarity.
- Improved commit messages.
v6:
- Fix a bug in large page remapping where setting the large map flag based
on the PFN offset's large page alignment within the region implicitly
assumed that the region's start offset was also large page aligned,
which could cause map hypercall failures.
- Fix a bug in large page unmapping where setting the large unmap flag for
an unaligned guest PFN range could result in unmap hypercall failures.
v5:
- Fix a bug in MMU notifier handling where an uninitialized 'ret' variable
could cause the warning about failed page invalidation to be skipped.
- Improve comment grammar regarding skipping the unmapping of non-mapped pages.
v4:
- Fix a bug in batch unmapping can skip mapped pages when selecting a new
batch due to wrong offset calculation.
- Fix an error message in case of failed memory region pinning.
v3:
- Region is invalidated even if the mm has no users.
- Page remapping logic is updated to support 2M-unaligned remappings for
regions that are PMD-aligned, which can occur during both faults and
invalidations.
v2:
- Split unmap batching into a separate patch.
- Fixed commit messages from v1 review.
- Renamed a few functions for clarity.
---
Stanislav Kinsburskii (7):
Drivers: hv: Refactor and rename memory region handling functions
Drivers: hv: Centralize guest memory region destruction
Drivers: hv: Move region management to mshv_regions.c
Drivers: hv: Fix huge page handling in memory region traversal
Drivers: hv: Improve region overlap detection in partition create
Drivers: hv: Add refcount and locking to mem regions
Drivers: hv: Add support for movable memory regions
drivers/hv/Kconfig | 2
drivers/hv/Makefile | 2
drivers/hv/mshv_regions.c | 548 +++++++++++++++++++++++++++++++++++++++++++
drivers/hv/mshv_root.h | 32 ++-
drivers/hv/mshv_root_main.c | 382 +++++++++++++-----------------
5 files changed, 745 insertions(+), 221 deletions(-)
create mode 100644 drivers/hv/mshv_regions.c
Powered by blists - more mailing lists