lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
 <SN6PR02MB4157553798D947A4B205AEF7D4A6A@SN6PR02MB4157.namprd02.prod.outlook.com>
Date: Thu, 4 Dec 2025 16:48:01 +0000
From: Michael Kelley <mhklinux@...look.com>
To: Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com>,
	"kys@...rosoft.com" <kys@...rosoft.com>, "haiyangz@...rosoft.com"
	<haiyangz@...rosoft.com>, "wei.liu@...nel.org" <wei.liu@...nel.org>,
	"decui@...rosoft.com" <decui@...rosoft.com>
CC: "linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH v7 6/7] Drivers: hv: Add refcount and locking to mem
 regions

From: Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com> Sent: Tuesday, November 25, 2025 6:09 PM
> 
> Introduce kref-based reference counting and spinlock protection for
> memory regions in Hyper-V partition management. This change improves
> memory region lifecycle management and ensures thread-safe access to the
> region list.
> 
> Also improves the check for overlapped memory regions during region
> creation, preventing duplicate or conflicting mappings.

This paragraph seems spurious. I think it applies to what's in Patch 5 of this
series.

> 
> Previously, the regions list was protected by the partition mutex.
> However, this approach is too heavy for frequent fault and invalidation
> operations. Finer grained locking is now used to improve efficiency and
> concurrency.
> 
> This is a precursor to supporting movable memory regions. Fault and
> invalidation handling for movable regions will require safe traversal of
> the region list and holding a region reference while performing
> invalidation or fault operations.

The commit message discussion about the need for the refcounting and
locking seemed a bit vague to me. It wasn't entirely clear whether these
changes are bug fixing existing race conditions, or whether they are new
functionality to support movable regions.

In looking at the existing code, it seems that the main serialization mechanisms
are that partition ioctls are serialized on pt_mutex, and VP ioctls are serialized
on vp_mutex (though multiple VP ioctls can be in progress simultaneously
against different VPs). The serialization of partition ioctls ensures that region
manipulation is serialized, and that, for example, two region creations can't
both verify that there's no overlap, but then overlap with each other. And
region creation and deletion are serialized. In current code, the VP ioctls don't
look at the region data structures, so there can't be any races between
partition and VP ioctls (which are not serialized with each other). The only
question I had about existing code is the mshv_partition_release() function,
which proceeds without serializing against any partition ioctls, but maybe
higher-level file system code ensures that no ioctls are in progress before
the .release callback is made.

The new requirement is movable regions, where the VP ioctl MSHV_RUN_VP
needs to look at region data structures. You've said that in the last paragraph
of your commit message. So I'm reading this as that the new locking is
needed specifically because multiple MSHV_RUN_VP ioctls will likely be
in flight simultaneously, and they are not currently serialized with the
region operations initiated by partition ioctls. And then there are the
"invalidate" callbacks that are running on some other kernel thread and
which also needs synchronization to do region manipulation.

Maybe I'm just looking for a little bit of a written "road map" somewhere
that describes the intended locking scheme at a high level. :-)

Michael

> 
> Signed-off-by: Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com>
> ---
>  drivers/hv/mshv_regions.c   |   19 ++++++++++++++++---
>  drivers/hv/mshv_root.h      |    6 +++++-
>  drivers/hv/mshv_root_main.c |   34 ++++++++++++++++++++++++++--------
>  3 files changed, 47 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/hv/mshv_regions.c b/drivers/hv/mshv_regions.c
> index d535d2e3e811..6450a7ed8493 100644
> --- a/drivers/hv/mshv_regions.c
> +++ b/drivers/hv/mshv_regions.c
> @@ -7,6 +7,7 @@
>   * Authors: Microsoft Linux virtualization team
>   */
> 
> +#include <linux/kref.h>
>  #include <linux/mm.h>
>  #include <linux/vmalloc.h>
> 
> @@ -154,6 +155,8 @@ struct mshv_mem_region *mshv_region_create(u64 guest_pfn, u64 nr_pages,
>  	if (!is_mmio)
>  		region->flags.range_pinned = true;
> 
> +	kref_init(&region->refcount);
> +
>  	return region;
>  }
> 
> @@ -303,13 +306,13 @@ static int mshv_region_unmap(struct mshv_mem_region *region)
>  					 mshv_region_chunk_unmap);
>  }
> 
> -void mshv_region_destroy(struct mshv_mem_region *region)
> +static void mshv_region_destroy(struct kref *ref)
>  {
> +	struct mshv_mem_region *region =
> +		container_of(ref, struct mshv_mem_region, refcount);
>  	struct mshv_partition *partition = region->partition;
>  	int ret;
> 
> -	hlist_del(&region->hnode);
> -
>  	if (mshv_partition_encrypted(partition)) {
>  		ret = mshv_region_share(region);
>  		if (ret) {
> @@ -326,3 +329,13 @@ void mshv_region_destroy(struct mshv_mem_region *region)
> 
>  	vfree(region);
>  }
> +
> +void mshv_region_put(struct mshv_mem_region *region)
> +{
> +	kref_put(&region->refcount, mshv_region_destroy);
> +}
> +
> +int mshv_region_get(struct mshv_mem_region *region)
> +{
> +	return kref_get_unless_zero(&region->refcount);
> +}
> diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
> index ff3374f13691..4249534ba900 100644
> --- a/drivers/hv/mshv_root.h
> +++ b/drivers/hv/mshv_root.h
> @@ -72,6 +72,7 @@ do { \
> 
>  struct mshv_mem_region {
>  	struct hlist_node hnode;
> +	struct kref refcount;
>  	u64 nr_pages;
>  	u64 start_gfn;
>  	u64 start_uaddr;
> @@ -97,6 +98,8 @@ struct mshv_partition {
>  	u64 pt_id;
>  	refcount_t pt_ref_count;
>  	struct mutex pt_mutex;
> +
> +	spinlock_t pt_mem_regions_lock;
>  	struct hlist_head pt_mem_regions; // not ordered
> 
>  	u32 pt_vp_count;
> @@ -319,6 +322,7 @@ int mshv_region_unshare(struct mshv_mem_region *region);
>  int mshv_region_map(struct mshv_mem_region *region);
>  void mshv_region_invalidate(struct mshv_mem_region *region);
>  int mshv_region_pin(struct mshv_mem_region *region);
> -void mshv_region_destroy(struct mshv_mem_region *region);
> +void mshv_region_put(struct mshv_mem_region *region);
> +int mshv_region_get(struct mshv_mem_region *region);
> 
>  #endif /* _MSHV_ROOT_H_ */
> diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
> index ae600b927f49..1ef2a28beb17 100644
> --- a/drivers/hv/mshv_root_main.c
> +++ b/drivers/hv/mshv_root_main.c
> @@ -1086,9 +1086,13 @@ static int mshv_partition_create_region(struct mshv_partition *partition,
>  	u64 nr_pages = HVPFN_DOWN(mem->size);
> 
>  	/* Reject overlapping regions */
> +	spin_lock(&partition->pt_mem_regions_lock);
>  	if (mshv_partition_region_by_gfn(partition, mem->guest_pfn) ||
> -	    mshv_partition_region_by_gfn(partition, mem->guest_pfn + nr_pages - 1))
> +	    mshv_partition_region_by_gfn(partition, mem->guest_pfn + nr_pages - 1)) {
> +		spin_unlock(&partition->pt_mem_regions_lock);
>  		return -EEXIST;
> +	}
> +	spin_unlock(&partition->pt_mem_regions_lock);
> 
>  	rg = mshv_region_create(mem->guest_pfn, nr_pages,
>  				mem->userspace_addr, mem->flags,
> @@ -1220,8 +1224,9 @@ mshv_map_user_memory(struct mshv_partition *partition,
>  	if (ret)
>  		goto errout;
> 
> -	/* Install the new region */
> +	spin_lock(&partition->pt_mem_regions_lock);
>  	hlist_add_head(&region->hnode, &partition->pt_mem_regions);
> +	spin_unlock(&partition->pt_mem_regions_lock);
> 
>  	return 0;
> 
> @@ -1240,17 +1245,27 @@ mshv_unmap_user_memory(struct mshv_partition *partition,
>  	if (!(mem.flags & BIT(MSHV_SET_MEM_BIT_UNMAP)))
>  		return -EINVAL;
> 
> +	spin_lock(&partition->pt_mem_regions_lock);
> +
>  	region = mshv_partition_region_by_gfn(partition, mem.guest_pfn);
> -	if (!region)
> -		return -EINVAL;
> +	if (!region) {
> +		spin_unlock(&partition->pt_mem_regions_lock);
> +		return -ENOENT;
> +	}
> 
>  	/* Paranoia check */
>  	if (region->start_uaddr != mem.userspace_addr ||
>  	    region->start_gfn != mem.guest_pfn ||
> -	    region->nr_pages != HVPFN_DOWN(mem.size))
> +	    region->nr_pages != HVPFN_DOWN(mem.size)) {
> +		spin_unlock(&partition->pt_mem_regions_lock);
>  		return -EINVAL;
> +	}
> +
> +	hlist_del(&region->hnode);
> 
> -	mshv_region_destroy(region);
> +	spin_unlock(&partition->pt_mem_regions_lock);
> +
> +	mshv_region_put(region);
> 
>  	return 0;
>  }
> @@ -1653,8 +1668,10 @@ static void destroy_partition(struct mshv_partition *partition)
>  	remove_partition(partition);
> 
>  	hlist_for_each_entry_safe(region, n, &partition->pt_mem_regions,
> -				  hnode)
> -		mshv_region_destroy(region);
> +				  hnode) {
> +		hlist_del(&region->hnode);
> +		mshv_region_put(region);
> +	}
> 
>  	/* Withdraw and free all pages we deposited */
>  	hv_call_withdraw_memory(U64_MAX, NUMA_NO_NODE, partition->pt_id);
> @@ -1852,6 +1869,7 @@ mshv_ioctl_create_partition(void __user *user_arg, struct device *module_dev)
> 
>  	INIT_HLIST_HEAD(&partition->pt_devices);
> 
> +	spin_lock_init(&partition->pt_mem_regions_lock);
>  	INIT_HLIST_HEAD(&partition->pt_mem_regions);
> 
>  	mshv_eventfd_init(partition);
> 
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ