lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID:
 <SN6PR02MB4157FBBE5B77C65B024D3589D4E9A@SN6PR02MB4157.namprd02.prod.outlook.com>
Date: Thu, 16 Oct 2025 19:29:06 +0000
From: Michael Kelley <mhklinux@...look.com>
To: Praveen K Paladugu <prapal@...ux.microsoft.com>, "kys@...rosoft.com"
	<kys@...rosoft.com>, "haiyangz@...rosoft.com" <haiyangz@...rosoft.com>,
	"wei.liu@...nel.org" <wei.liu@...nel.org>, "decui@...rosoft.com"
	<decui@...rosoft.com>, "tglx@...utronix.de" <tglx@...utronix.de>,
	"mingo@...hat.com" <mingo@...hat.com>, "linux-hyperv@...r.kernel.org"
	<linux-hyperv@...r.kernel.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "bp@...en8.de" <bp@...en8.de>,
	"dave.hansen@...ux.intel.com" <dave.hansen@...ux.intel.com>, "x86@...nel.org"
	<x86@...nel.org>, "hpa@...or.com" <hpa@...or.com>, "arnd@...db.de"
	<arnd@...db.de>
CC: "anbelski@...ux.microsoft.com" <anbelski@...ux.microsoft.com>,
	"easwar.hariharan@...ux.microsoft.com"
	<easwar.hariharan@...ux.microsoft.com>, "nunodasneves@...ux.microsoft.com"
	<nunodasneves@...ux.microsoft.com>, "skinsburskii@...ux.microsoft.com"
	<skinsburskii@...ux.microsoft.com>
Subject: RE: [PATCH v2 2/2] hyperv: Enable clean shutdown for root partition
 with MSHV

From: Praveen K Paladugu <prapal@...ux.microsoft.com> Sent: Tuesday, October 14, 2025 9:41 AM
> 
> When a shutdown is initiated in the root partition without configuring
> sleep states, the call to `hv_call_enter_sleep_state` fails. In such cases
> the root falls back to using legacy ACPI mechanisms to poweroff. This call
> is intercepted by MSHV and will result in a Machine Check Exception (MCE).
> 
> Root panics with a trace similar to:
> 
> [   81.306348] reboot: Power down
> [   81.314709] mce: [Hardware Error]: CPU 0: Machine Check Exception: 4 Bank 0: b2000000c0060001
> [   81.314711] mce: [Hardware Error]: TSC 3b8cb60a66 PPIN 11d98332458e4ea9
> [   81.314713] mce: [Hardware Error]: PROCESSOR 0:606a6 TIME 1759339405 SOCKET 0 APIC 0 microcode ffffffff
> [   81.314715] mce: [Hardware Error]: Run the above through 'mcelog --ascii'
> [   81.314716] mce: [Hardware Error]: Machine check: Processor context corrupt
> [   81.314717] Kernel panic - not syncing: Fatal machine check
> 
> To prevent this, properly configure sleep states within MSHV, allowing
> the root partition to shut down cleanly without triggering a panic.
> 
> Signed-off-by: Praveen K Paladugu <prapal@...ux.microsoft.com>
> Co-developed-by: Anatol Belski <anbelski@...ux.microsoft.com>
> Signed-off-by: Anatol Belski <anbelski@...ux.microsoft.com>
> ---
>  arch/x86/hyperv/hv_init.c       |   7 ++
>  arch/x86/include/asm/mshyperv.h |   1 +
>  drivers/hv/hv_common.c          | 119 ++++++++++++++++++++++++++++++++
>  3 files changed, 127 insertions(+)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index afdbda2dd7b7..57bd96671ead 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -510,6 +510,13 @@ void __init hyperv_init(void)
>  		memunmap(src);
> 
>  		hv_remap_tsc_clocksource();
> +		/*
> +		 * The notifier registration might fail at various hops.
> +		 * Corresponding error messages will land in dmesg. There is
> +		 * otherwise nothing that can be specifically done to handle
> +		 * failures here.
> +		 */
> +		(void)hv_sleep_notifiers_register();
>  	} else {
>  		hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg);
>  		wrmsrq(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index abc4659f5809..fb8d691193df 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -236,6 +236,7 @@ int hyperv_fill_flush_guest_mapping_list(
>  void hv_apic_init(void);
>  void __init hv_init_spinlocks(void);
>  bool hv_vcpu_is_preempted(int vcpu);
> +int hv_sleep_notifiers_register(void);
>  #else
>  static inline void hv_apic_init(void) {}
>  #endif
> diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> index e109a620c83f..cfba9ded7bcb 100644
> --- a/drivers/hv/hv_common.c
> +++ b/drivers/hv/hv_common.c
> @@ -837,3 +837,122 @@ const char *hv_result_to_string(u64 status)
>  	return "Unknown";
>  }
>  EXPORT_SYMBOL_GPL(hv_result_to_string);
> +
> +#if IS_ENABLED(CONFIG_ACPI)
> +/*
> + * Corresponding sleep states have to be initialized in order for a subsequent
> + * HVCALL_ENTER_SLEEP_STATE call to succeed. Currently only S5 state as per
> + * ACPI 6.4 chapter 7.4.2 is relevant, while S1, S2 and S3 can be supported.
> + *
> + * ACPI should be initialized and should support S5 sleep state when this method
> + * is called, so that it can extract correct PM values and pass them to hv.
> + */
> +static int hv_initialize_sleep_states(void)
> +{
> +	u64 status;
> +	unsigned long flags;
> +	struct hv_input_set_system_property *in;
> +	acpi_status acpi_status;
> +	u8 sleep_type_a, sleep_type_b;
> +
> +	if (!acpi_sleep_state_supported(ACPI_STATE_S5)) {
> +		pr_err("%s: S5 sleep state not supported.\n", __func__);
> +		return -ENODEV;
> +	}
> +
> +	acpi_status = acpi_get_sleep_type_data(ACPI_STATE_S5,
> +						&sleep_type_a, &sleep_type_b);
> +	if (ACPI_FAILURE(acpi_status))
> +		return -ENODEV;
> +
> +	local_irq_save(flags);
> +	in = *this_cpu_ptr(hyperv_pcpu_input_arg);
> +	memset(in, 0, sizeof(*in));
> +
> +	in->property_id = HV_SYSTEM_PROPERTY_SLEEP_STATE;
> +	in->set_sleep_state_info.sleep_state = HV_SLEEP_STATE_S5;
> +	in->set_sleep_state_info.pm1a_slp_typ = sleep_type_a;
> +	in->set_sleep_state_info.pm1b_slp_typ = sleep_type_b;
> +
> +	status = hv_do_hypercall(HVCALL_SET_SYSTEM_PROPERTY, in, NULL);
> +	local_irq_restore(flags);
> +
> +	if (!hv_result_success(status)) {
> +		hv_status_err(status, "\n");
> +		return hv_result_to_errno(status);
> +	}
> +
> +	return 0;
> +}
> +
> +static int hv_call_enter_sleep_state(u32 sleep_state)
> +{
> +	u64 status;
> +	int ret;
> +	unsigned long flags;
> +	struct hv_input_enter_sleep_state *in;
> +
> +	ret = hv_initialize_sleep_states();
> +	if (ret)
> +		return ret;
> +
> +	local_irq_save(flags);
> +	in = *this_cpu_ptr(hyperv_pcpu_input_arg);
> +	in->sleep_state = sleep_state;
> +
> +	status = hv_do_hypercall(HVCALL_ENTER_SLEEP_STATE, in, NULL);

If this hypercall succeeds, does the root partition (which is the caller) go
to sleep in S5, such that the hypercall never returns? If that's not the case,
what is the behavior of this hypercall?

> +	local_irq_restore(flags);
> +
> +	if (!hv_result_success(status)) {
> +		hv_status_err(status, "\n");
> +		return hv_result_to_errno(status);
> +	}
> +
> +	return 0;
> +}
> +
> +static int hv_reboot_notifier_handler(struct notifier_block *this,
> +				      unsigned long code, void *another)
> +{
> +	int ret = 0;
> +
> +	if (code == SYS_HALT || code == SYS_POWER_OFF)
> +		ret = hv_call_enter_sleep_state(HV_SLEEP_STATE_S5);

If hv_call_enter_sleep_state() never returns, here's an issue. There may be
multiple entries on the reboot notifier chain. For example,
mshv_root_partition_init() puts an entry on the reboot notifier chain. At
reboot time, the entries are executed in some order, with the expectation
that all entries will be executed prior to the reboot actually happening. But
if this hypercall never returns, some entries may never be executed.

Notifier chains support a notion of priority to control the order in
which they are executed, but that priority isn't set in hv_reboot_notifier
below, or in mshv_reboot_nb. And most other reboot notifiers throughout
Linux appear to not set it. So the ordering is unspecified, and having
this notifier never return may be problematic.

> +
> +	return ret ? NOTIFY_DONE : NOTIFY_OK;
> +}
> +
> +static struct notifier_block hv_reboot_notifier = {
> +	.notifier_call  = hv_reboot_notifier_handler,
> +};
> +
> +static int hv_acpi_sleep_handler(u8 sleep_state, u32 pm1a_cnt, u32 pm1b_cnt)
> +{
> +	int ret = 0;
> +
> +	if (sleep_state == ACPI_STATE_S5)
> +		ret = hv_call_enter_sleep_state(HV_SLEEP_STATE_S5);
> +
> +	return ret == 0 ? 1 : -1;
> +}
> +
> +static int hv_acpi_extended_sleep_handler(u8 sleep_state, u32 val_a, u32 val_b)
> +{
> +	return hv_acpi_sleep_handler(sleep_state, val_a, val_b);
> +}

Is this function needed? The function signature is identical to hv_acpi_sleep_handler().
So it seems like acpi_os_set_prepare_extended_sleep() could just use
hv_acpi_sleep_handler() directly.

> +
> +int hv_sleep_notifiers_register(void)
> +{
> +	int ret;
> +
> +	acpi_os_set_prepare_sleep(&hv_acpi_sleep_handler);
> +	acpi_os_set_prepare_extended_sleep(&hv_acpi_extended_sleep_handler);

I'm not clear on why these handlers are set. If the hv_reboot_notifier is
called, are these ACPI handlers ever called? Or are these to catch any cases
where the hv_reboot_notifier is somehow bypassed? Or maybe I'm just
not understanding something .... :-)

> +
> +	ret = register_reboot_notifier(&hv_reboot_notifier);
> +	if (ret)
> +		pr_err("%s: cannot register reboot notifier %d\n",
> +			__func__, ret);
> +
> +	return ret;
> +}
> +#endif

I'm wondering if all this code belongs in hv_common.c, since it is only needed
for Linux in the root partition. Couldn't it go in mshv_common.c? It would still
be built-in code (i.e., not in a loadable module), but only if CONFIG_MSHV_ROOT
is set.

Michael

> --
> 2.51.0
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ