[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aX0TCpXFxI8zVlQ1@anirudh-surface.localdomain>
Date: Fri, 30 Jan 2026 20:22:34 +0000
From: Anirudh Rayabharam <anirudh@...rudhrb.com>
To: Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com>
Cc: Michael Kelley <mhklinux@...look.com>,
"kys@...rosoft.com" <kys@...rosoft.com>,
"haiyangz@...rosoft.com" <haiyangz@...rosoft.com>,
"wei.liu@...nel.org" <wei.liu@...nel.org>,
"decui@...rosoft.com" <decui@...rosoft.com>,
"longli@...rosoft.com" <longli@...rosoft.com>,
"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/2] mshv: Add support for integrated scheduler
On Fri, Jan 30, 2026 at 10:51:10AM -0800, Stanislav Kinsburskii wrote:
> On Fri, Jan 30, 2026 at 06:43:09PM +0000, Anirudh Rayabharam wrote:
> > On Fri, Jan 30, 2026 at 10:37:38AM -0800, Stanislav Kinsburskii wrote:
> > > On Fri, Jan 30, 2026 at 05:30:25PM +0000, Anirudh Rayabharam wrote:
> > > > On Thu, Jan 29, 2026 at 11:09:46AM -0800, Stanislav Kinsburskii wrote:
> > > > > On Thu, Jan 29, 2026 at 05:47:02PM +0000, Michael Kelley wrote:
> > > > > > From: Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com> Sent: Wednesday, January 21, 2026 2:36 PM
> > > > > > >
> > > > > > > From: Andreea Pintilie <anpintil@...rosoft.com>
> > > > > > >
> > > > > > > Query the hypervisor for integrated scheduler support and use it if
> > > > > > > configured.
> > > > > > >
> > > > > > > Microsoft Hypervisor originally provided two schedulers: root and core. The
> > > > > > > root scheduler allows the root partition to schedule guest vCPUs across
> > > > > > > physical cores, supporting both time slicing and CPU affinity (e.g., via
> > > > > > > cgroups). In contrast, the core scheduler delegates vCPU-to-physical-core
> > > > > > > scheduling entirely to the hypervisor.
> > > > > > >
> > > > > > > Direct virtualization introduces a new privileged guest partition type - L1
> > > > > > > Virtual Host (L1VH) — which can create child partitions from its own
> > > > > > > resources. These child partitions are effectively siblings, scheduled by
> > > > > > > the hypervisor's core scheduler. This prevents the L1VH parent from setting
> > > > > > > affinity or time slicing for its own processes or guest VPs. While cgroups,
> > > > > > > CFS, and cpuset controllers can still be used, their effectiveness is
> > > > > > > unpredictable, as the core scheduler swaps vCPUs according to its own logic
> > > > > > > (typically round-robin across all allocated physical CPUs). As a result,
> > > > > > > the system may appear to "steal" time from the L1VH and its children.
> > > > > > >
> > > > > > > To address this, Microsoft Hypervisor introduces the integrated scheduler.
> > > > > > This the s allows an L1VH partition to schedule its own vCPUs and those of its
> > > > > > > guests across its "physical" cores, effectively emulating root scheduler
> > > > > > > behavior within the L1VH, while retaining core scheduler behavior for the
> > > > > > > rest of the system.
> > > > > > >
> > > > > > > The integrated scheduler is controlled by the root partition and gated by
> > > > > > > the vmm_enable_integrated_scheduler capability bit. If set, the hypervisor
> > > > > > > supports the integrated scheduler. The L1VH partition must then check if it
> > > > > > > is enabled by querying the corresponding extended partition property. If
> > > > > > > this property is true, the L1VH partition must use the root scheduler
> > > > > > > logic; otherwise, it must use the core scheduler.
> > > > > > >
> > > > > > > Signed-off-by: Andreea Pintilie <anpintil@...rosoft.com>
> > > > > > > Signed-off-by: Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com>
> > > > > > > ---
> > > > > > > drivers/hv/mshv_root_main.c | 79 +++++++++++++++++++++++++++++--------------
> > > > > > > include/hyperv/hvhdk_mini.h | 6 +++
> > > > > > > 2 files changed, 58 insertions(+), 27 deletions(-)
> > > > > > >
> > >
> > > <snip>
> > >
> > > > > > > -root_sched_deinit:
> > > > > > > - root_scheduler_deinit();
> > > > > > > - return err;
> > > > > > > }
> > > > > > >
> > > > > > > -static void mshv_init_vmm_caps(struct device *dev)
> > > > > > > +static int mshv_init_vmm_caps(struct device *dev)
> > > > > > > {
> > > > > > > - /*
> > > > > > > - * This can only fail here if HVCALL_GET_PARTITION_PROPERTY_EX or
> > > > > > > - * HV_PARTITION_PROPERTY_VMM_CAPABILITIES are not supported. In that
> > > > > > > - * case it's valid to proceed as if all vmm_caps are disabled (zero).
> > > > > > > - */
> > > > > > > - if (hv_call_get_partition_property_ex(HV_PARTITION_ID_SELF,
> > > > > > > - HV_PARTITION_PROPERTY_VMM_CAPABILITIES,
> > > > > > > - 0, &mshv_root.vmm_caps,
> > > > > > > - sizeof(mshv_root.vmm_caps)))
> > > > > > > - dev_warn(dev, "Unable to get VMM capabilities\n");
> > > > > > > + int ret;
> > > > > > > +
> > > > > > > + ret = hv_call_get_partition_property_ex(HV_PARTITION_ID_SELF,
> > > > > > > + HV_PARTITION_PROPERTY_VMM_CAPABILITIES,
> > > > > > > + 0, &mshv_root.vmm_caps,
> > > > > > > + sizeof(mshv_root.vmm_caps));
> > > > > > > + if (ret) {
> > > > > > > + dev_err(dev, "Failed to get VMM capabilities: %d\n", ret);
> > > > > > > + return ret;
> > > > > > > + }
> > > > > >
> > > > > > This is a functional change that isn't mentioned in the commit message.
> > > > > > Why is it now appropriate to fail instead of treating the VMM capabilities
> > > > > > as all disabled? Presumably there are older versions of the hypervisor that
> > > > > > don't support the requirements described in the original comment, but
> > > > > > perhaps they are no longer relevant?
> > > > > >
> > > > >
> > > > > To fail is now the only option for the L1VH partition. It must discover
> > > > > the scheduler type. Without this information, the partition cannot
> > > > > operate. The core scheduler logic will not work with an integrated
> > > > > scheduler, and vice versa.
> > > >
> > > > I don't think we need to fail here. If we don't find vmm caps, that
> > > > means we are on an older hypervisor that supports l1vh but not
> > > > integrated scheduler (yes, such a version exists). In this case since
> > > > integrated scheduler is not supported by the hypervisor, the core
> > > > scheduler logic will work.
> > > >
> > >
> > > The older hypervisor version won't have the integrated scheduler
> > > capabity bit.
> > > And we can't operate in core schedule mode if the integrated is enabled
> > > underneath us.
> >
> > The older hypervisor won't have the integrated scheduler capability bit.
> > This means that the older hypervisor doesn't support integrated
> > scheduler (this is how vmm caps work: if the bit doesn't exist or
> > vmm caps themselves don't exist the feature should be assumed as not
> > available). If the hypervisor doesn't support integrated scheduler in the
> > first place, it can't be enabled underneath us. So, it is safe to
> > operate in core scheduler mode.
> >
>
> We can’t tell whether the hypervisor is older and simply doesn’t have
> the VMM caps bit, or whether we just failed to fetch the VMM caps.
If we failed to fetch the VMM caps i.e. the hypervisor doesn't support
the vmm caps property, we must assume that all the bits in vmm caps are
0 (i.e. no features are available). This is how vmm capabilities are
supposed to be interpreted. This is something I checked with the
hypervisor team some time back.
>
> In other words, we can’t distinguish between “an older hypervisor
> without integrated scheduler support” and “a newer hypervisor with an
> integrated scheduler, but we failed to fetch the VMM caps”.
>
> But for completeness: are you saying there is an older hypervisor
> version that supports L1VH, but does not support VMM caps?
I don't know how much of the Azure fleet still runs it but yes such a
hypervisor version exists.
Thanks,
Anirudh
>
> Thanks, Stanislav
>
> > Thanks,
> > Anirudh.
Powered by blists - more mailing lists