lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
 <SN6PR02MB4157EDC69791EF24D5DA8661D491A@SN6PR02MB4157.namprd02.prod.outlook.com>
Date: Wed, 28 Jan 2026 15:53:04 +0000
From: Michael Kelley <mhklinux@...look.com>
To: Mukesh R <mrathor@...ux.microsoft.com>, Stanislav Kinsburskii
	<skinsburskii@...ux.microsoft.com>
CC: "kys@...rosoft.com" <kys@...rosoft.com>, "haiyangz@...rosoft.com"
	<haiyangz@...rosoft.com>, "wei.liu@...nel.org" <wei.liu@...nel.org>,
	"decui@...rosoft.com" <decui@...rosoft.com>, "longli@...rosoft.com"
	<longli@...rosoft.com>, "linux-hyperv@...r.kernel.org"
	<linux-hyperv@...r.kernel.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>
Subject: RE: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC

From: Mukesh R <mrathor@...ux.microsoft.com> Sent: Tuesday, January 27, 2026 11:56 AM
> To: Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com>
> Cc: kys@...rosoft.com; haiyangz@...rosoft.com; wei.liu@...nel.org;
> decui@...rosoft.com; longli@...rosoft.com; linux-hyperv@...r.kernel.org; linux-
> kernel@...r.kernel.org
> Subject: Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC
> 
> On 1/27/26 09:47, Stanislav Kinsburskii wrote:
> > On Mon, Jan 26, 2026 at 05:39:49PM -0800, Mukesh R wrote:
> >> On 1/26/26 16:21, Stanislav Kinsburskii wrote:
> >>> On Mon, Jan 26, 2026 at 03:07:18PM -0800, Mukesh R wrote:
> >>>> On 1/26/26 12:43, Stanislav Kinsburskii wrote:
> >>>>> On Mon, Jan 26, 2026 at 12:20:09PM -0800, Mukesh R wrote:
> >>>>>> On 1/25/26 14:39, Stanislav Kinsburskii wrote:
> >>>>>>> On Fri, Jan 23, 2026 at 04:16:33PM -0800, Mukesh R wrote:
> >>>>>>>> On 1/23/26 14:20, Stanislav Kinsburskii wrote:
> >>>>>>>>> The MSHV driver deposits kernel-allocated pages to the hypervisor during
> >>>>>>>>> runtime and never withdraws them. This creates a fundamental incompatibility
> >>>>>>>>> with KEXEC, as these deposited pages remain unavailable to the new kernel
> >>>>>>>>> loaded via KEXEC, leading to potential system crashes upon kernel accessing
> >>>>>>>>> hypervisor deposited pages.
> >>>>>>>>>
> >>>>>>>>> Make MSHV mutually exclusive with KEXEC until proper page lifecycle
> >>>>>>>>> management is implemented.
> >>>>>>>>>
> >>>>>>>>> Signed-off-by: Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com>
> >>>>>>>>> ---
> >>>>>>>>>       drivers/hv/Kconfig |    1 +
> >>>>>>>>>       1 file changed, 1 insertion(+)
> >>>>>>>>>
> >>>>>>>>> diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
> >>>>>>>>> index 7937ac0cbd0f..cfd4501db0fa 100644
> >>>>>>>>> --- a/drivers/hv/Kconfig
> >>>>>>>>> +++ b/drivers/hv/Kconfig
> >>>>>>>>> @@ -74,6 +74,7 @@ config MSHV_ROOT
> >>>>>>>>>       	# e.g. When withdrawing memory, the hypervisor gives back 4k pages in
> >>>>>>>>>       	# no particular order, making it impossible to reassemble larger pages
> >>>>>>>>>       	depends on PAGE_SIZE_4KB
> >>>>>>>>> +	depends on !KEXEC
> >>>>>>>>>       	select EVENTFD
> >>>>>>>>>       	select VIRT_XFER_TO_GUEST_WORK
> >>>>>>>>>       	select HMM_MIRROR
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> Will this affect CRASH kexec? I see few CONFIG_CRASH_DUMP in kexec.c
> >>>>>>>> implying that crash dump might be involved. Or did you test kdump
> >>>>>>>> and it was fine?
> >>>>>>>>
> >>>>>>>
> >>>>>>> Yes, it will. Crash kexec depends on normal kexec functionality, so it
> >>>>>>> will be affected as well.
> >>>>>>
> >>>>>> So not sure I understand the reason for this patch. We can just block
> >>>>>> kexec if there are any VMs running, right? Doing this would mean any
> >>>>>> further developement would be without a ver important and major feature,
> >>>>>> right?
> >>>>>
> >>>>> This is an option. But until it's implemented and merged, a user mshv
> >>>>> driver gets into a situation where kexec is broken in a non-obvious way.
> >>>>> The system may crash at any time after kexec, depending on whether the
> >>>>> new kernel touches the pages deposited to hypervisor or not. This is a
> >>>>> bad user experience.
> >>>>
> >>>> I understand that. But with this we cannot collect core and debug any
> >>>> crashes. I was thinking there would be a quick way to prohibit kexec
> >>>> for update via notifier or some other quick hack. Did you already
> >>>> explore that and didn't find anything, hence this?
> >>>>
> >>>
> >>> This quick hack you mention isn't quick in the upstream kernel as there
> >>> is no hook to interrupt kexec process except the live update one.
> >>
> >> That's the one we want to interrupt and block right? crash kexec
> >> is ok and should be allowed. We can document we don't support kexec
> >> for update for now.
> >>
> >>> I sent an RFC for that one but given todays conversation details is
> >>> won't be accepted as is.
> >>
> >> Are you taking about this?
> >>
> >>          "mshv: Add kexec safety for deposited pages"
> >>
> >
> > Yes.
> >
> >>> Making mshv mutually exclusive with kexec is the only viable option for
> >>> now given time constraints.
> >>> It is intended to be replaced with proper page lifecycle management in
> >>> the future.
> >>
> >> Yeah, that could take a long time and imo we cannot just disable KEXEC
> >> completely. What we want is just block kexec for updates from some
> >> mshv file for now, we an print during boot that kexec for updates is
> >> not supported on mshv. Hope that makes sense.
> >>
> >
> > The trade-off here is between disabling kexec support and having the
> > kernel crash after kexec in a non-obvious way. This affects both regular
> > kexec and crash kexec.
> 
> crash kexec on baremetal is not affected, hence disabling that
> doesn't make sense as we can't debug crashes then on bm.
> 
> Let me think and explore a bit, and if I come up with something, I'll
> send a patch here. If nothing, then we can do this as last resort.
> 
> Thanks,
> -Mukesh

Maybe you've already looked at this, but there's a sysctl parameter
kernel.kexec_load_limit_reboot that prevents loading a kexec
kernel for reboot if the value is zero. Separately, there is
kernel.kexec_load_limit_panic that controls whether a kexec
kernel can be loaded for kdump purposes.

kernel.kexec_load_limit_reboot defaults to -1, which allows an
unlimited number of loading a kexec kernel for reboot. But the value
can be set to zero with this kernel boot line parameter:

sysctl.kernel.kexec_load_limit_reboot=0

Alternatively, the mshv driver initialization could add code along
the lines of process_sysctl_arg() to open
/proc/sys/kernel/kexec_load_limit_reboot and write a value of zero.
Then there's no dependency on setting the kernel boot line.

The downside to either method is that after Linux in the root partition
is up-and-running, it is possible to change the sysctl to a non-zero value,
and then load a kexec kernel for reboot. So this approach isn't absolute
protection against doing a kexec for reboot. But it makes it harder, and 
until there's a mechanism to reclaim the deposited pages, it might be
a viable compromise to allow kdump to still be used.

Just a thought ....

Michael

> 
> 
> > It?s a pity we can?t apply a quick hack to disable only regular kexec.
> > However, since crash kexec would hit the same issues, until we have a
> > proper state transition for deposted pages, the best workaround for now
> > is to reset the hypervisor state on every kexec, which needs design,
> > work, and testing.
> >
> > Disabling kexec is the only consistent way to handle this in the
> > upstream kernel at the moment.
> >
> > Thanks, Stanislav

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ