[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aXzS8VPmfInQVort@skinsburskii.localdomain>
Date: Fri, 30 Jan 2026 07:49:05 -0800
From: Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com>
To: Michael Kelley <mhklinux@...look.com>
Cc: "kys@...rosoft.com" <kys@...rosoft.com>,
"haiyangz@...rosoft.com" <haiyangz@...rosoft.com>,
"wei.liu@...nel.org" <wei.liu@...nel.org>,
"decui@...rosoft.com" <decui@...rosoft.com>,
"longli@...rosoft.com" <longli@...rosoft.com>,
"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/2] mshv: Add support for integrated scheduler
On Fri, Jan 30, 2026 at 01:24:34AM +0000, Michael Kelley wrote:
> From: Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com> Sent: Thursday, January 29, 2026 11:10 AM
> >
> > On Thu, Jan 29, 2026 at 05:47:02PM +0000, Michael Kelley wrote:
> > > From: Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com> Sent: Wednesday, January 21, 2026 2:36 PM
> >
> > <snip>
> >
> > > > static int __init mshv_root_partition_init(struct device *dev)
> > > > {
> > > > int err;
> > > >
> > > > - err = root_scheduler_init(dev);
> > > > - if (err)
> > > > - return err;
> > > > -
> > > > err = register_reboot_notifier(&mshv_reboot_nb);
> > > > if (err)
> > > > - goto root_sched_deinit;
> > > > + return err;
> > > >
> > > > return 0;
> > >
> > > This code is now:
> > >
> > > if (err)
> > > return err;
> > > return 0;
> > >
> > > which can be simplified to just:
> > >
> > > return err;
> > >
> > > Or drop the local variable 'err' and simplify the entire function to:
> > >
> > > return register_reboot_notifier(&mshv_reboot_nb);
> > >
> > > There's a tangential question here: Why is this reboot notifier
> > > needed in the first place? All it does is remove the cpuhp state
> > > that allocates/frees the per-cpu root_scheduler_input and
> > > root_scheduler_output pages. Removing the state will free
> > > the pages, but if Linux is rebooting, why bother?
> > >
> >
> > This was originally done to support kexec.
> > Here is the original commit message:
> >
> > mshv: perform synic cleanup during kexec
> >
> > Register a reboot notifier that performs synic cleanup when a kexec
> > is in progress.
> >
> > One notable issue this commit fixes is one where after a kexec, virtio
> > devices are not functional. Linux root partition receives MMIO doorbell
> > events in the ring buffer in the SIRB synic page. The hypervisor maintains
> > a head pointer where it writes new events into the ring buffer. The root
> > partition maintains a tail pointer to read events from the buffer.
> >
> > Upon kexec reboot, all root data structures are re-initialized and thus the
> > tail pointer gets reset to zero. The hypervisor on the other hand still
> > retains the pre-kexec head pointer which could be non-zero. This means that
> > when the hypervisor writes new events to the ring buffer, the root
> > partition looks at the wrong place and doesn't find any events. So, future
> > doorbell events never get delivered. As a result, virtqueue kicks never get
> > delivered to the host.
> >
> > When the SIRB page is disabled the hypervisor resets the head pointer.
>
> FWIW, I don't see that commit message anywhere in a public source code
> tree. The calls to register/unregister_reboot_notifier() were in the original
> introduction of mshv_root_main.c in upstream commit 621191d709b14.
> Evidently the code described by that commit message was not submitted
> upstream. And of course, the kexec() topic is now being revisited ....
>
> So to clarify: Do you expect that in the future the reboot notifier will be
> used for something that really is required for resetting hypervisor state
> in the case of a kexec reboot?
>
Yes, for now it's the best we have.
This code can be dropped later if we get a better way to handle kexec.
> >
> > > > -root_sched_deinit:
> > > > - root_scheduler_deinit();
> > > > - return err;
> > > > }
> > > >
> > > > -static void mshv_init_vmm_caps(struct device *dev)
> > > > +static int mshv_init_vmm_caps(struct device *dev)
> > > > {
> > > > - /*
> > > > - * This can only fail here if HVCALL_GET_PARTITION_PROPERTY_EX or
> > > > - * HV_PARTITION_PROPERTY_VMM_CAPABILITIES are not supported. In that
> > > > - * case it's valid to proceed as if all vmm_caps are disabled (zero).
> > > > - */
> > > > - if (hv_call_get_partition_property_ex(HV_PARTITION_ID_SELF,
> > > > - HV_PARTITION_PROPERTY_VMM_CAPABILITIES,
> > > > - 0, &mshv_root.vmm_caps,
> > > > - sizeof(mshv_root.vmm_caps)))
> > > > - dev_warn(dev, "Unable to get VMM capabilities\n");
> > > > + int ret;
> > > > +
> > > > + ret = hv_call_get_partition_property_ex(HV_PARTITION_ID_SELF,
> > > > + HV_PARTITION_PROPERTY_VMM_CAPABILITIES,
> > > > + 0, &mshv_root.vmm_caps,
> > > > + sizeof(mshv_root.vmm_caps));
> > > > + if (ret) {
> > > > + dev_err(dev, "Failed to get VMM capabilities: %d\n", ret);
> > > > + return ret;
> > > > + }
> > >
> > > This is a functional change that isn't mentioned in the commit message.
> > > Why is it now appropriate to fail instead of treating the VMM capabilities
> > > as all disabled? Presumably there are older versions of the hypervisor that
> > > don't support the requirements described in the original comment, but
> > > perhaps they are no longer relevant?
> > >
> >
> > To fail is now the only option for the L1VH partition. It must discover
> > the scheduler type. Without this information, the partition cannot
> > operate. The core scheduler logic will not work with an integrated
> > scheduler, and vice versa.
> >
> > And yes, older hypervisor versions do not support L1VH.
>
> That makes sense. Your change in v2 of the patch handles this
> nicely. For the non-L1VH case, the v2 behavior is the same as before in
> that the init path won't error out on older hypervisors that don't
> support the requirements described in the original comment. That's
> the case I am concerned about.
>
Yes. Thank you for the review and feedback!
Stanislav
> Michael
Powered by blists - more mailing lists