linux-kernel - Re: [PATCH 2/2] mshv: Add support for integrated scheduler

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aXzS8VPmfInQVort@skinsburskii.localdomain>
Date: Fri, 30 Jan 2026 07:49:05 -0800
From: Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com>
To: Michael Kelley <mhklinux@...look.com>
Cc: "kys@...rosoft.com" <kys@...rosoft.com>,
	"haiyangz@...rosoft.com" <haiyangz@...rosoft.com>,
	"wei.liu@...nel.org" <wei.liu@...nel.org>,
	"decui@...rosoft.com" <decui@...rosoft.com>,
	"longli@...rosoft.com" <longli@...rosoft.com>,
	"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 2/2] mshv: Add support for integrated scheduler

On Fri, Jan 30, 2026 at 01:24:34AM +0000, Michael Kelley wrote:
> From: Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com> Sent: Thursday, January 29, 2026 11:10 AM
> > 
> > On Thu, Jan 29, 2026 at 05:47:02PM +0000, Michael Kelley wrote:
> > > From: Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com> Sent: Wednesday, January 21, 2026 2:36 PM
> > 
> > <snip>
> > 
> > > >  static int __init mshv_root_partition_init(struct device *dev)
> > > >  {
> > > >  	int err;
> > > >
> > > > -	err = root_scheduler_init(dev);
> > > > -	if (err)
> > > > -		return err;
> > > > -
> > > >  	err = register_reboot_notifier(&mshv_reboot_nb);
> > > >  	if (err)
> > > > -		goto root_sched_deinit;
> > > > +		return err;
> > > >
> > > >  	return 0;
> > >
> > > This code is now:
> > >
> > > 	if (err)
> > > 		return err;
> > > 	return 0;
> > >
> > > which can be simplified to just:
> > >
> > > 	return err;
> > >
> > > Or drop the local variable 'err' and simplify the entire function to:
> > >
> > > 	return register_reboot_notifier(&mshv_reboot_nb);
> > >
> > > There's a tangential question here: Why is this reboot notifier
> > > needed in the first place? All it does is remove the cpuhp state
> > > that allocates/frees the per-cpu root_scheduler_input and
> > > root_scheduler_output pages. Removing the state will free
> > > the pages, but if Linux is rebooting, why bother?
> > >
> > 
> > This was originally done to support kexec.
> > Here is the original commit message:
> > 
> >     mshv: perform synic cleanup during kexec
> > 
> >     Register a reboot notifier that performs synic cleanup when a kexec
> >     is in progress.
> > 
> >     One notable issue this commit fixes is one where after a kexec, virtio
> >     devices are not functional. Linux root partition receives MMIO doorbell
> >     events in the ring buffer in the SIRB synic page. The hypervisor maintains
> >     a head pointer where it writes new events into the ring buffer. The root
> >     partition maintains a tail pointer to read events from the buffer.
> > 
> >     Upon kexec reboot, all root data structures are re-initialized and thus the
> >     tail pointer gets reset to zero. The hypervisor on the other hand still
> >     retains the pre-kexec head pointer which could be non-zero. This means that
> >     when the hypervisor writes new events to the ring buffer, the root
> >     partition looks at the wrong place and doesn't find any events. So, future
> >     doorbell events never get delivered. As a result, virtqueue kicks never get
> >     delivered to the host.
> > 
> >     When the SIRB page is disabled the hypervisor resets the head pointer.
> 
> FWIW, I don't see that commit message anywhere in a public source code
> tree. The calls to register/unregister_reboot_notifier() were in the original
> introduction of mshv_root_main.c in upstream commit 621191d709b14.
> Evidently the code described by that commit message was not submitted
> upstream. And of course, the kexec() topic is now being revisited ....
> 
> So to clarify: Do you expect that in the future the reboot notifier will be
> used for something that really is required for resetting hypervisor state
> in the case of a kexec reboot?
> 

Yes, for now it's the best we have.
This code can be dropped later if we get a better way to handle kexec.

> > 
> > > > -root_sched_deinit:
> > > > -	root_scheduler_deinit();
> > > > -	return err;
> > > >  }
> > > >
> > > > -static void mshv_init_vmm_caps(struct device *dev)
> > > > +static int mshv_init_vmm_caps(struct device *dev)
> > > >  {
> > > > -	/*
> > > > -	 * This can only fail here if HVCALL_GET_PARTITION_PROPERTY_EX or
> > > > -	 * HV_PARTITION_PROPERTY_VMM_CAPABILITIES are not supported. In that
> > > > -	 * case it's valid to proceed as if all vmm_caps are disabled (zero).
> > > > -	 */
> > > > -	if (hv_call_get_partition_property_ex(HV_PARTITION_ID_SELF,
> > > > -					      HV_PARTITION_PROPERTY_VMM_CAPABILITIES,
> > > > -					      0, &mshv_root.vmm_caps,
> > > > -					      sizeof(mshv_root.vmm_caps)))
> > > > -		dev_warn(dev, "Unable to get VMM capabilities\n");
> > > > +	int ret;
> > > > +
> > > > +	ret = hv_call_get_partition_property_ex(HV_PARTITION_ID_SELF,
> > > > +					 	HV_PARTITION_PROPERTY_VMM_CAPABILITIES,
> > > > +						0, &mshv_root.vmm_caps,
> > > > +						sizeof(mshv_root.vmm_caps));
> > > > +	if (ret) {
> > > > +		dev_err(dev, "Failed to get VMM capabilities: %d\n", ret);
> > > > +		return ret;
> > > > +	}
> > >
> > > This is a functional change that isn't mentioned in the commit message.
> > > Why is it now appropriate to fail instead of treating the VMM capabilities
> > > as all disabled? Presumably there are older versions of the hypervisor that
> > > don't support the requirements described in the original comment, but
> > > perhaps they are no longer relevant?
> > >
> > 
> > To fail is now the only option for the L1VH partition. It must discover
> > the scheduler type. Without this information, the partition cannot
> > operate. The core scheduler logic will not work with an integrated
> > scheduler, and vice versa.
> > 
> > And yes, older hypervisor versions do not support L1VH.
> 
> That makes sense. Your change in v2 of the patch handles this
> nicely. For the non-L1VH case, the v2 behavior is the same as before in
> that the init path won't error out on older hypervisors that don't
> support the requirements described in the original comment. That's
> the case I am concerned about.
> 

Yes. Thank you for the review and feedback!

Stanislav
> Michael