[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aTH7v2XbEpps68WH@skinsburskii.localdomain>
Date: Thu, 4 Dec 2025 13:23:11 -0800
From: Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com>
To: Michael Kelley <mhklinux@...look.com>
Cc: "kys@...rosoft.com" <kys@...rosoft.com>,
"haiyangz@...rosoft.com" <haiyangz@...rosoft.com>,
"wei.liu@...nel.org" <wei.liu@...nel.org>,
"decui@...rosoft.com" <decui@...rosoft.com>,
"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v7 6/7] Drivers: hv: Add refcount and locking to mem
regions
On Thu, Dec 04, 2025 at 04:48:01PM +0000, Michael Kelley wrote:
> From: Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com> Sent: Tuesday, November 25, 2025 6:09 PM
> >
> > Introduce kref-based reference counting and spinlock protection for
> > memory regions in Hyper-V partition management. This change improves
> > memory region lifecycle management and ensures thread-safe access to the
> > region list.
> >
> > Also improves the check for overlapped memory regions during region
> > creation, preventing duplicate or conflicting mappings.
>
> This paragraph seems spurious. I think it applies to what's in Patch 5 of this
> series.
>
Indeed,this chunk escaped cleanup after refactoring.
> >
> > Previously, the regions list was protected by the partition mutex.
> > However, this approach is too heavy for frequent fault and invalidation
> > operations. Finer grained locking is now used to improve efficiency and
> > concurrency.
> >
> > This is a precursor to supporting movable memory regions. Fault and
> > invalidation handling for movable regions will require safe traversal of
> > the region list and holding a region reference while performing
> > invalidation or fault operations.
>
> The commit message discussion about the need for the refcounting and
> locking seemed a bit vague to me. It wasn't entirely clear whether these
> changes are bug fixing existing race conditions, or whether they are new
> functionality to support movable regions.
>
> In looking at the existing code, it seems that the main serialization mechanisms
> are that partition ioctls are serialized on pt_mutex, and VP ioctls are serialized
> on vp_mutex (though multiple VP ioctls can be in progress simultaneously
> against different VPs). The serialization of partition ioctls ensures that region
> manipulation is serialized, and that, for example, two region creations can't
> both verify that there's no overlap, but then overlap with each other. And
> region creation and deletion are serialized. In current code, the VP ioctls don't
> look at the region data structures, so there can't be any races between
> partition and VP ioctls (which are not serialized with each other). The only
> question I had about existing code is the mshv_partition_release() function,
> which proceeds without serializing against any partition ioctls, but maybe
> higher-level file system code ensures that no ioctls are in progress before
> the .release callback is made.
>
> The new requirement is movable regions, where the VP ioctl MSHV_RUN_VP
> needs to look at region data structures. You've said that in the last paragraph
> of your commit message. So I'm reading this as that the new locking is
> needed specifically because multiple MSHV_RUN_VP ioctls will likely be
> in flight simultaneously, and they are not currently serialized with the
> region operations initiated by partition ioctls. And then there are the
> "invalidate" callbacks that are running on some other kernel thread and
> which also needs synchronization to do region manipulation.
>
> Maybe I'm just looking for a little bit of a written "road map" somewhere
> that describes the intended locking scheme at a high level. :-)
>
> Michael
>
You understand this correctly.
In short, there were only two concurrent operations on regions before
movable pages were introduced: addition and removal. Both could happen
only via the partition ioctl, which is serialized by the partition
mutex, so everything was simple.
With the introduction of movable pages, regions — both the list of
regions and the region contents themselves — are accessed by partition
VP threads, which do not hold the partition mutex. While access to
region contents is protected by a per-region mutex, nothing prevents the
VMM from removing and destroying a region from underneath a VP thread
that is currently servicing a page fault or invalidation. This, in turn,
leads to a general protection fault.
This commit solves the issue by making the region a reference-counted
object so it persists while being serviced, and by adding a spinlock to
protect list traversal.
Thanks, Stanislav
> >
> > Signed-off-by: Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com>
> > ---
> > drivers/hv/mshv_regions.c | 19 ++++++++++++++++---
> > drivers/hv/mshv_root.h | 6 +++++-
> > drivers/hv/mshv_root_main.c | 34 ++++++++++++++++++++++++++--------
> > 3 files changed, 47 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/hv/mshv_regions.c b/drivers/hv/mshv_regions.c
> > index d535d2e3e811..6450a7ed8493 100644
> > --- a/drivers/hv/mshv_regions.c
> > +++ b/drivers/hv/mshv_regions.c
> > @@ -7,6 +7,7 @@
> > * Authors: Microsoft Linux virtualization team
> > */
> >
> > +#include <linux/kref.h>
> > #include <linux/mm.h>
> > #include <linux/vmalloc.h>
> >
> > @@ -154,6 +155,8 @@ struct mshv_mem_region *mshv_region_create(u64 guest_pfn, u64 nr_pages,
> > if (!is_mmio)
> > region->flags.range_pinned = true;
> >
> > + kref_init(®ion->refcount);
> > +
> > return region;
> > }
> >
> > @@ -303,13 +306,13 @@ static int mshv_region_unmap(struct mshv_mem_region *region)
> > mshv_region_chunk_unmap);
> > }
> >
> > -void mshv_region_destroy(struct mshv_mem_region *region)
> > +static void mshv_region_destroy(struct kref *ref)
> > {
> > + struct mshv_mem_region *region =
> > + container_of(ref, struct mshv_mem_region, refcount);
> > struct mshv_partition *partition = region->partition;
> > int ret;
> >
> > - hlist_del(®ion->hnode);
> > -
> > if (mshv_partition_encrypted(partition)) {
> > ret = mshv_region_share(region);
> > if (ret) {
> > @@ -326,3 +329,13 @@ void mshv_region_destroy(struct mshv_mem_region *region)
> >
> > vfree(region);
> > }
> > +
> > +void mshv_region_put(struct mshv_mem_region *region)
> > +{
> > + kref_put(®ion->refcount, mshv_region_destroy);
> > +}
> > +
> > +int mshv_region_get(struct mshv_mem_region *region)
> > +{
> > + return kref_get_unless_zero(®ion->refcount);
> > +}
> > diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
> > index ff3374f13691..4249534ba900 100644
> > --- a/drivers/hv/mshv_root.h
> > +++ b/drivers/hv/mshv_root.h
> > @@ -72,6 +72,7 @@ do { \
> >
> > struct mshv_mem_region {
> > struct hlist_node hnode;
> > + struct kref refcount;
> > u64 nr_pages;
> > u64 start_gfn;
> > u64 start_uaddr;
> > @@ -97,6 +98,8 @@ struct mshv_partition {
> > u64 pt_id;
> > refcount_t pt_ref_count;
> > struct mutex pt_mutex;
> > +
> > + spinlock_t pt_mem_regions_lock;
> > struct hlist_head pt_mem_regions; // not ordered
> >
> > u32 pt_vp_count;
> > @@ -319,6 +322,7 @@ int mshv_region_unshare(struct mshv_mem_region *region);
> > int mshv_region_map(struct mshv_mem_region *region);
> > void mshv_region_invalidate(struct mshv_mem_region *region);
> > int mshv_region_pin(struct mshv_mem_region *region);
> > -void mshv_region_destroy(struct mshv_mem_region *region);
> > +void mshv_region_put(struct mshv_mem_region *region);
> > +int mshv_region_get(struct mshv_mem_region *region);
> >
> > #endif /* _MSHV_ROOT_H_ */
> > diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
> > index ae600b927f49..1ef2a28beb17 100644
> > --- a/drivers/hv/mshv_root_main.c
> > +++ b/drivers/hv/mshv_root_main.c
> > @@ -1086,9 +1086,13 @@ static int mshv_partition_create_region(struct mshv_partition *partition,
> > u64 nr_pages = HVPFN_DOWN(mem->size);
> >
> > /* Reject overlapping regions */
> > + spin_lock(&partition->pt_mem_regions_lock);
> > if (mshv_partition_region_by_gfn(partition, mem->guest_pfn) ||
> > - mshv_partition_region_by_gfn(partition, mem->guest_pfn + nr_pages - 1))
> > + mshv_partition_region_by_gfn(partition, mem->guest_pfn + nr_pages - 1)) {
> > + spin_unlock(&partition->pt_mem_regions_lock);
> > return -EEXIST;
> > + }
> > + spin_unlock(&partition->pt_mem_regions_lock);
> >
> > rg = mshv_region_create(mem->guest_pfn, nr_pages,
> > mem->userspace_addr, mem->flags,
> > @@ -1220,8 +1224,9 @@ mshv_map_user_memory(struct mshv_partition *partition,
> > if (ret)
> > goto errout;
> >
> > - /* Install the new region */
> > + spin_lock(&partition->pt_mem_regions_lock);
> > hlist_add_head(®ion->hnode, &partition->pt_mem_regions);
> > + spin_unlock(&partition->pt_mem_regions_lock);
> >
> > return 0;
> >
> > @@ -1240,17 +1245,27 @@ mshv_unmap_user_memory(struct mshv_partition *partition,
> > if (!(mem.flags & BIT(MSHV_SET_MEM_BIT_UNMAP)))
> > return -EINVAL;
> >
> > + spin_lock(&partition->pt_mem_regions_lock);
> > +
> > region = mshv_partition_region_by_gfn(partition, mem.guest_pfn);
> > - if (!region)
> > - return -EINVAL;
> > + if (!region) {
> > + spin_unlock(&partition->pt_mem_regions_lock);
> > + return -ENOENT;
> > + }
> >
> > /* Paranoia check */
> > if (region->start_uaddr != mem.userspace_addr ||
> > region->start_gfn != mem.guest_pfn ||
> > - region->nr_pages != HVPFN_DOWN(mem.size))
> > + region->nr_pages != HVPFN_DOWN(mem.size)) {
> > + spin_unlock(&partition->pt_mem_regions_lock);
> > return -EINVAL;
> > + }
> > +
> > + hlist_del(®ion->hnode);
> >
> > - mshv_region_destroy(region);
> > + spin_unlock(&partition->pt_mem_regions_lock);
> > +
> > + mshv_region_put(region);
> >
> > return 0;
> > }
> > @@ -1653,8 +1668,10 @@ static void destroy_partition(struct mshv_partition *partition)
> > remove_partition(partition);
> >
> > hlist_for_each_entry_safe(region, n, &partition->pt_mem_regions,
> > - hnode)
> > - mshv_region_destroy(region);
> > + hnode) {
> > + hlist_del(®ion->hnode);
> > + mshv_region_put(region);
> > + }
> >
> > /* Withdraw and free all pages we deposited */
> > hv_call_withdraw_memory(U64_MAX, NUMA_NO_NODE, partition->pt_id);
> > @@ -1852,6 +1869,7 @@ mshv_ioctl_create_partition(void __user *user_arg, struct device *module_dev)
> >
> > INIT_HLIST_HEAD(&partition->pt_devices);
> >
> > + spin_lock_init(&partition->pt_mem_regions_lock);
> > INIT_HLIST_HEAD(&partition->pt_mem_regions);
> >
> > mshv_eventfd_init(partition);
> >
> >
>
Powered by blists - more mailing lists