[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID:
<BN7PR02MB41485DB7E9B53B4CEDA596EAD4F5A@BN7PR02MB4148.namprd02.prod.outlook.com>
Date: Mon, 20 Oct 2025 17:30:44 +0000
From: Michael Kelley <mhklinux@...look.com>
To: Praveen Paladugu <prapal@...ux.microsoft.com>
CC: "kys@...rosoft.com" <kys@...rosoft.com>, "haiyangz@...rosoft.com"
<haiyangz@...rosoft.com>, "wei.liu@...nel.org" <wei.liu@...nel.org>,
"decui@...rosoft.com" <decui@...rosoft.com>, "tglx@...utronix.de"
<tglx@...utronix.de>, "mingo@...hat.com" <mingo@...hat.com>,
"linux-hyperv@...r.kernel.org" <linux-hyperv@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "bp@...en8.de"
<bp@...en8.de>, "dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>,
"x86@...nel.org" <x86@...nel.org>, "hpa@...or.com" <hpa@...or.com>,
"arnd@...db.de" <arnd@...db.de>, "anbelski@...ux.microsoft.com"
<anbelski@...ux.microsoft.com>, "easwar.hariharan@...ux.microsoft.com"
<easwar.hariharan@...ux.microsoft.com>, "nunodasneves@...ux.microsoft.com"
<nunodasneves@...ux.microsoft.com>, "skinsburskii@...ux.microsoft.com"
<skinsburskii@...ux.microsoft.com>
Subject: RE: [PATCH v2 2/2] hyperv: Enable clean shutdown for root partition
with MSHV
From: Praveen Paladugu <prapal@...ux.microsoft.com> Sent: Monday, October 20, 2025 9:00 AM
>
> On Thu, Oct 16, 2025 at 07:29:06PM +0000, Michael Kelley wrote:
> > From: Praveen K Paladugu <prapal@...ux.microsoft.com> Sent: Tuesday, October 14, 2025 9:41 AM
> > >
[snip]
> > > +static int hv_call_enter_sleep_state(u32 sleep_state)
> > > +{
> > > + u64 status;
> > > + int ret;
> > > + unsigned long flags;
> > > + struct hv_input_enter_sleep_state *in;
> > > +
> > > + ret = hv_initialize_sleep_states();
> > > + if (ret)
> > > + return ret;
> > > +
> > > + local_irq_save(flags);
> > > + in = *this_cpu_ptr(hyperv_pcpu_input_arg);
> > > + in->sleep_state = sleep_state;
> > > +
> > > + status = hv_do_hypercall(HVCALL_ENTER_SLEEP_STATE, in, NULL);
> >
> > If this hypercall succeeds, does the root partition (which is the caller) go
> > to sleep in S5, such that the hypercall never returns? If that's not the case,
> > what is the behavior of this hypercall?
> >
> This hypercall returns to the kernel when the CPU wakes up the next
> time.
I must be missing something about the big picture, because "returns to
the kernel when the CPU wakes up" doesn't fit my mental model of what's
going on. I thought this function would be called, and the hypercall made,
when Linux in the root partition is shutting down. So if a CPU makes this
hypercall and goes to sleep, what wakes it up? And when it wakes up, is it
still running the same Linux instance that was shutting down, or has it
rebooted into new Linux instance? In the latter case, returning from
the hypercall doesn't make sense.
Can you explain further how this all works?
Michael
>
> > > + local_irq_restore(flags);
> > > +
> > > + if (!hv_result_success(status)) {
> > > + hv_status_err(status, "\n");
> > > + return hv_result_to_errno(status);
> > > + }
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static int hv_reboot_notifier_handler(struct notifier_block *this,
> > > + unsigned long code, void *another)
> > > +{
> > > + int ret = 0;
> > > +
> > > + if (code == SYS_HALT || code == SYS_POWER_OFF)
> > > + ret = hv_call_enter_sleep_state(HV_SLEEP_STATE_S5);
> >
> > If hv_call_enter_sleep_state() never returns, here's an issue. There may be
> > multiple entries on the reboot notifier chain. For example,
> > mshv_root_partition_init() puts an entry on the reboot notifier chain. At
> > reboot time, the entries are executed in some order, with the expectation
> > that all entries will be executed prior to the reboot actually happening. But
> > if this hypercall never returns, some entries may never be executed.
> >
> > Notifier chains support a notion of priority to control the order in
> > which they are executed, but that priority isn't set in hv_reboot_notifier
> > below, or in mshv_reboot_nb. And most other reboot notifiers throughout
> > Linux appear to not set it. So the ordering is unspecified, and having
> > this notifier never return may be problematic.
> >
> Thanks for the detailed explanation Michael!
>
> As I mentioned above, this hypercall returns to the kernel, so the rest
> of the entries in the notifier chain should continue to execute.
>
> > > +
> > > + return ret ? NOTIFY_DONE : NOTIFY_OK;
> > > +}
> > > +
> > > +static struct notifier_block hv_reboot_notifier = {
> > > + .notifier_call = hv_reboot_notifier_handler,
> > > +};
> > > +
> > > +static int hv_acpi_sleep_handler(u8 sleep_state, u32 pm1a_cnt, u32 pm1b_cnt)
> > > +{
> > > + int ret = 0;
> > > +
> > > + if (sleep_state == ACPI_STATE_S5)
> > > + ret = hv_call_enter_sleep_state(HV_SLEEP_STATE_S5);
> > > +
> > > + return ret == 0 ? 1 : -1;
> > > +}
> > > +
> > > +static int hv_acpi_extended_sleep_handler(u8 sleep_state, u32 val_a, u32 val_b)
> > > +{
> > > + return hv_acpi_sleep_handler(sleep_state, val_a, val_b);
> > > +}
> >
> > Is this function needed? The function signature is identical to hv_acpi_sleep_handler().
> > So it seems like acpi_os_set_prepare_extended_sleep() could just use
> > hv_acpi_sleep_handler() directly.
> >
> Upon further investigation, I discovered that extended sleep is only
> supported on platforms with ACPI_REDUCED_HARDWARE.
>
> As these patches are targetted at X86, above does not really apply. I
> will drop this handler in next version.
>
> > > +
> > > +int hv_sleep_notifiers_register(void)
> > > +{
> > > + int ret;
> > > +
> > > + acpi_os_set_prepare_sleep(&hv_acpi_sleep_handler);
> > > + acpi_os_set_prepare_extended_sleep(&hv_acpi_extended_sleep_handler);
> >
> > I'm not clear on why these handlers are set. If the hv_reboot_notifier is
> > called, are these ACPI handlers ever called? Or are these to catch any cases
> > where the hv_reboot_notifier is somehow bypassed? Or maybe I'm just
> > not understanding something .... :-)
> >
>
> I am trying to trace these calls. I will keep you posted with my
> findings.
>
> > > +
> > > + ret = register_reboot_notifier(&hv_reboot_notifier);
> > > + if (ret)
> > > + pr_err("%s: cannot register reboot notifier %d\n",
> > > + __func__, ret);
> > > +
> > > + return ret;
> > > +}
> > > +#endif
> >
> > I'm wondering if all this code belongs in hv_common.c, since it is only needed
> > for Linux in the root partition. Couldn't it go in mshv_common.c? It would still
> > be built-in code (i.e., not in a loadable module), but only if CONFIG_MSHV_ROOT
> > is set.
> >
>
> This sounds reasonable. I will discuss this internally and get back you.
>
> > Michael
> >
> > > --
> > > 2.51.0
> > >
Powered by blists - more mailing lists