linux-kernel - RE: [PATCH v3 3/5] Drivers: hv: Batch GPA unmap operations to improve large region performance

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
 <SN6PR02MB415777A957EF622587C54A35D4E4A@SN6PR02MB4157.namprd02.prod.outlook.com>
Date: Fri, 3 Oct 2025 21:41:32 +0000
From: Michael Kelley <mhklinux@...look.com>
To: Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com>
CC: "kys@...rosoft.com" <kys@...rosoft.com>, "haiyangz@...rosoft.com"
	<haiyangz@...rosoft.com>, "wei.liu@...nel.org" <wei.liu@...nel.org>,
	"decui@...rosoft.com" <decui@...rosoft.com>, "linux-hyperv@...r.kernel.org"
	<linux-hyperv@...r.kernel.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>
Subject: RE: [PATCH v3 3/5] Drivers: hv: Batch GPA unmap operations to improve
 large region performance

From: Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com> Sent: Friday, October 3, 2025 9:52 AM

> 
> On Fri, Oct 03, 2025 at 12:27:13AM +0000, Michael Kelley wrote:
> > From: Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com> Sent: Thursday, October 2, 2025 9:36 AM
> > >
> > > Reduce overhead when unmapping large memory regions by batching GPA unmap
> > > operations in 2MB-aligned chunks.
> > >
> > > Use a dedicated constant for batch size to improve code clarity and
> > > maintainability.
> > >
> > > Signed-off-by: Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com>
> > > ---
> > >  drivers/hv/mshv_root.h         |    2 ++
> > >  drivers/hv/mshv_root_hv_call.c |    2 +-
> > >  drivers/hv/mshv_root_main.c    |   15 ++++++++++++---
> > >  3 files changed, 15 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/hv/mshv_root.h b/drivers/hv/mshv_root.h
> > > index e3931b0f12693..97e64d5341b6e 100644
> > > --- a/drivers/hv/mshv_root.h
> > > +++ b/drivers/hv/mshv_root.h
> > > @@ -32,6 +32,8 @@ static_assert(HV_HYP_PAGE_SIZE == MSHV_HV_PAGE_SIZE);
> > >
> > >  #define MSHV_PIN_PAGES_BATCH_SIZE	(0x10000000ULL / HV_HYP_PAGE_SIZE)
> > >
> > > +#define MSHV_MAX_UNMAP_GPA_PAGES	512
> > > +
> > >  struct mshv_vp {
> > >  	u32 vp_index;
> > >  	struct mshv_partition *vp_partition;
> > > diff --git a/drivers/hv/mshv_root_hv_call.c b/drivers/hv/mshv_root_hv_call.c
> > > index c9c274f29c3c6..0696024ccfe31 100644
> > > --- a/drivers/hv/mshv_root_hv_call.c
> > > +++ b/drivers/hv/mshv_root_hv_call.c
> > > @@ -17,7 +17,7 @@
> > >  /* Determined empirically */
> > >  #define HV_INIT_PARTITION_DEPOSIT_PAGES 208
> > >  #define HV_MAP_GPA_DEPOSIT_PAGES	256
> > > -#define HV_UMAP_GPA_PAGES		512
> > > +#define HV_UMAP_GPA_PAGES		MSHV_MAX_UNMAP_GPA_PAGES
> > >
> > >  #define HV_PAGE_COUNT_2M_ALIGNED(pg_count) (!((pg_count) & (0x200 - 1)))
> > >
> > > diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
> > > index 7ef66c43e1515..8bf0b5af25802 100644
> > > --- a/drivers/hv/mshv_root_main.c
> > > +++ b/drivers/hv/mshv_root_main.c
> > > @@ -1378,6 +1378,7 @@ mshv_map_user_memory(struct mshv_partition *partition,
> > >  static void mshv_partition_destroy_region(struct mshv_mem_region *region)
> > >  {
> > >  	struct mshv_partition *partition = region->partition;
> > > +	u64 page_offset;
> > >  	u32 unmap_flags = 0;
> > >  	int ret;
> > >
> > > @@ -1396,9 +1397,17 @@ static void mshv_partition_destroy_region(struct mshv_mem_region *region)
> > >  	if (region->flags.large_pages)
> > >  		unmap_flags |= HV_UNMAP_GPA_LARGE_PAGE;
> > >
> > > -	/* ignore unmap failures and continue as process may be exiting */
> > > -	hv_call_unmap_gpa_pages(partition->pt_id, region->start_gfn,
> > > -				region->nr_pages, unmap_flags);
> > > +	for (page_offset = 0; page_offset < region->nr_pages; page_offset++) {
> > > +		if (!region->pages[page_offset])
> > > +			continue;
> > > +
> > > +		hv_call_unmap_gpa_pages(partition->pt_id,
> > > +					ALIGN(region->start_gfn + page_offset,
> > > +					      MSHV_MAX_UNMAP_GPA_PAGES),
> >
> > Seems like this should be ALIGN_DOWN() instead of ALIGN().  ALIGN() rounds
> > up to get the requested alignment, which could skip over some non-zero
> > entries in region->pages[].
> >
> 
> Indeed, the goal is to unmap in 2MB chunks as it's the max payload that
> can be passed to hypervisor.
> I'll replace it with ALIGN_DOWN in the next revision.
> 
> > And I'm assuming the unmap hypercall is idempotent in that if the specified
> > partition PFN range includes some pages that aren't mapped, the hypercall
> > just skips them and keeps going without returning an error.
> >
> 
> It might be the case, but it's not reliable.
> If the region size isn’t aligned to MSHV_MAX_UNMAP_GPA_PAGES (i.e., not
> aligned to 2M), the hypervisor can return an error for the trailing
> invalid (non-zero) PFNs.
> I’ll fix this in the next revision.
> 
> > > +					MSHV_MAX_UNMAP_GPA_PAGES, unmap_flags);
> > > +
> > > +		page_offset += MSHV_MAX_UNMAP_GPA_PAGES - 1;
> >
> > This update to the page_offset doesn't take into account the effect of the
> > ALIGN (or ALIGN_DOWN) call.  With a change to ALIGN_DOWN(), it may
> > increment too far and perhaps cause the "for" loop to be exited prematurely,
> > which would fail to unmap some of the pages.
> >
> 
> I’m not sure I see the problem here.  If we align the offset by
> MSHV_MAX_UNMAP_GPA_PAGES and unmap the same number of pages, then we
> should increment the offset by that very same number, shouldn’t we?

Here's an example showing the problem I see (assuming ALIGN_DOWN
instead of ALIGN):

1) For simplicity in the example, assume region->start_gfn is zero.
2) Entries 0 thru 3 (i.e., 4 entries) in region->pages[] are zero.
3) Entries 4 thru 515 (the next 512 entries) are non-zero.
4) Entries 516 thru 1023 (the next 508 entries) are zero.
5) Entries 1024 thru 1535 (the last 512 entries) are non-zero.

Upon entering the "for" loop for the first time, page_offset gets
incremented to 4 because of skipping entries 0 thru 3 that are zero.
On the next iteration where page_offset is 4, the hypercall is made,
passing 0 for the gfn (because of ALIGN_DOWN), with a count of
512, so entries 0 thru 511 are unmapped. Entries 0 thru 3 are valid
entries, and the fact that they aren't mapped is presumably ignored
by the hypercall, so everything works.

Then page_offset is incremented by 511, so it will be 515. Continuing
the "for" loop increments page_offset to 516. The zero entries 516
thru 1023 increment page_offset to 1024. Finally the hypercall is made
again covering entries 1024 thru 1535, none of which are zero.

But notice that entries 512 thru 515 (which are non-zero) got skipped.
That's because the first invocation of the hypercall covered only through
entry 511, while page_offset was incremented to 515. page_offset
should have been set to 511, since that's the last entry processed by
the first invocation of the hypercall.

Michael