[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aSTN4nlRiXOXmKlA@skinsburskii.localdomain>
Date: Mon, 24 Nov 2025 13:28:02 -0800
From: Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com>
To: Michael Kelley <mhklinux@...look.com>
Cc: "kys@...rosoft.com" <kys@...rosoft.com>,
"haiyangz@...rosoft.com" <haiyangz@...rosoft.com>,
"wei.liu@...nel.org" <wei.liu@...nel.org>,
"decui@...rosoft.com" <decui@...rosoft.com>,
"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v6 5/5] Drivers: hv: Add support for movable memory
regions
On Fri, Nov 21, 2025 at 05:45:20AM +0000, Michael Kelley wrote:
> From: Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com> Sent: Wednesday, November 19, 2025 4:45 PM
> >
> > On Tue, Nov 18, 2025 at 04:29:56PM +0000, Michael Kelley wrote:
> > > From: Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com> Sent: Monday, November 17, 2025 8:53 AM
>
> [snip]
>
> > > > diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
> > > > index 0b8c391a0342..5f1637cbb6e3 100644
> > > > --- a/drivers/hv/Kconfig
> > > > +++ b/drivers/hv/Kconfig
> > > > @@ -75,6 +75,7 @@ config MSHV_ROOT
> > > > depends on PAGE_SIZE_4KB
> > > > select EVENTFD
> > > > select VIRT_XFER_TO_GUEST_WORK
> > > > + select HMM_MIRROR
> > >
> > > Couldn't you also do "select MMU_NOTIFIER" to avoid the #ifdef's
> > > and stubs for when it isn't selected? There are other Linux kernel
> > > drivers that select it. Or is the intent to allow building an image that
> > > doesn't support unpinned memory, and the #ifdef's save space?
> > >
> >
> > That's an interesting question. This driver can function without MMU notifiers
> > by pinning all memory, which might be advantageous for certain real-time
> > applications.
> > However, since most other virtualization solutions use MMU_NOTIFIER, there
> > doesn't appear to be a strong reason for this driver to deviate.
>
> I'm not clear on your last sentence. Could you elaborate?
>
I meant I'll select MMU_NOTIFIER.
>
> Right. But even for pinned regions as coded today, is that assumption
> correct? Due to memory being fragmented at the time of region creation,
> it would be possible that some 2Meg ranges in a region are backed by a large
> page, while other 2Meg ranges are not. In that case, a single per-region flag
> isn't enough information. Or does the hypercall work OK if the "map huge
> page" flag is passed when the range isn't a huge page? I'm not clear on what
> the hypercall requires as input.
>
It is not with THP enabled and missing MAP_HUGETLB mmap flag.
I'll fix it in the next revision.
>
> OK. So what is the impact? Losing the perf benefit of mapping guest
> memory in the SLAT as a 2 Meg large page vs. a bunch of individual 4K
> pages? Anything else?
>
I decided to rework it in scope of these the series.
> >
> > This is possible, if hypervisor returns some invalid GFN.
> > But there is also a possibility, that this code can race with region removal from a guest.
> > I'll address it in the next revision.
>
> In either of these cases, what happens next? The MSHV_RUN_VP ioctl
> will return to user space with the unhandled HVMSG_GPA_INTERCEPT
> message. Is there anything user space can do to enable the VP to make
> progress past the fault? Or does user space just have to terminate the
> guest VM?
>
I don't think there is much to be done here in kernel.
Control will be returned to VMM, which will likely kill the guest.
> > >
> > > I'm pretty sure region->start_uaddr is always page aligned. But what
> > > about range->start and range->end? The code here and below assumes
> > > they are page aligned. It also assumes that range->end is greater than
> > > range->start so the computation of page_count doesn't wrap and so
> > > page_count is >= 1. I don't know whether checks for these assumptions
> > > are appropriate.
> > >
> >
> > There is a check for memory region size to be non-zero and page aligned
> > in mshv_partition_ioct_set_memory function, which is the only caller for
> > memory region creation. And region start is defined in PFNs.
> >
>
> Right -- no disagreement that the region start and size are page aligned
> and non-zero. But what about the range that is being invalidated?
> (i.e., range->start and range->end) The values in that range are coming
> from the mm subsystem, and aren't governed by how a region is created.
> If that range is a subset of the MSHV region, then
> mshv_region_internal_invalidate() will be operating on whatever subset
> was provided in the 'range' argument.
>
> mshv_region_interval_invalidate() is ultimately called from
> mmu_notifier_invalidate_range_start(), which has about 30 different
> callers in the kernel, mostly in the mm subsystem. It wasn't
> clear to me what rules, if any, those 30 callers are following when they
> set up the range to be invalidated.
>
> Michael
I see what you mean. Yes, these values are guarantee to be page aligned.
And it must be as anything else can't be invalidated.
Thanks,
Stanislav
Powered by blists - more mailing lists