[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aX0Vbfocwa4WgXUw@anirudh-surface.localdomain>
Date: Fri, 30 Jan 2026 20:32:45 +0000
From: Anirudh Rayabharam <anirudh@...rudhrb.com>
To: Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com>
Cc: kys@...rosoft.com, haiyangz@...rosoft.com, wei.liu@...nel.org,
decui@...rosoft.com, longli@...rosoft.com,
linux-hyperv@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC
On Fri, Jan 30, 2026 at 10:46:45AM -0800, Stanislav Kinsburskii wrote:
> On Fri, Jan 30, 2026 at 05:11:12PM +0000, Anirudh Rayabharam wrote:
> > On Wed, Jan 28, 2026 at 03:11:14PM -0800, Stanislav Kinsburskii wrote:
> > > On Wed, Jan 28, 2026 at 04:16:31PM +0000, Anirudh Rayabharam wrote:
> > > > On Mon, Jan 26, 2026 at 12:46:44PM -0800, Stanislav Kinsburskii wrote:
> > > > > On Tue, Jan 27, 2026 at 12:19:24AM +0530, Anirudh Rayabharam wrote:
> > > > > > On Fri, Jan 23, 2026 at 10:20:53PM +0000, Stanislav Kinsburskii wrote:
> > > > > > > The MSHV driver deposits kernel-allocated pages to the hypervisor during
> > > > > > > runtime and never withdraws them. This creates a fundamental incompatibility
> > > > > > > with KEXEC, as these deposited pages remain unavailable to the new kernel
> > > > > > > loaded via KEXEC, leading to potential system crashes upon kernel accessing
> > > > > > > hypervisor deposited pages.
> > > > > > >
> > > > > > > Make MSHV mutually exclusive with KEXEC until proper page lifecycle
> > > > > > > management is implemented.
> > > > > >
> > > > > > Someone might want to stop all guest VMs and do a kexec. Which is valid
> > > > > > and would work without any issue for L1VH.
> > > > > >
> > > > >
> > > > > No, it won't work and hypervsisor depostied pages won't be withdrawn.
> > > >
> > > > All pages that were deposited in the context of a guest partition (i.e.
> > > > with the guest partition ID), would be withdrawn when you kill the VMs,
> > > > right? What other deposited pages would be left?
> > > >
> > >
> > > The driver deposits two types of pages: one for the guests (withdrawn
> > > upon gust shutdown) and the other - for the host itself (never
> > > withdrawn).
> > > See hv_call_create_partition, for example: it deposits pages for the
> > > host partition.
> >
> > Hmm.. I see. Is it not possible to reclaim this memory in module_exit?
> > Also, can't we forcefully kill all running partitions in module_exit and
> > then reclaim memory? Would this help with kernel consistency
> > irrespective of userspace behavior?
> >
>
> It would, but this is sloppy and cannot be a long-term solution.
>
> It is also not reliable. We have no hook to prevent kexec. So if we fail
> to kill the guest or reclaim the memory for any reason, the new kernel
> may still crash.
Actually guests won't be running by the time we reach our module_exit
function during a kexec. Userspace processes would've been killed by
then.
Also, why is this sloppy? Isn't this what module_exit should be
doing anyway? If someone unloads our module we should be trying to
clean everything up (including killing guests) and reclaim memory.
In any case, we can BUG() out if we fail to reclaim the memory. That would
stop the kexec.
This is a better solution since instead of disabling KEXEC outright: our
driver made the best possible efforts to make kexec work.
>
> There are two long-term solutions:
> 1. Add a way to prevent kexec when there is shared state between the hypervisor and the kernel.
I honestly think we should focus efforts on making kexec work rather
than finding ways to prevent it.
Thanks,
Anirudh
> 2. Hand the shared kernel state over to the new kernel.
>
> I sent a series for the first one. The second one is not ready yet.
> Anything else is neither robust nor reliable, so I don’t think it makes
> sense to pursue it.
>
> Thanks,
> Stanislav
>
>
> > Thanks,
> > Anirudh.
> >
> > >
> > > Thanks,
> > > Stanislav
> > >
> > > > Thanks,
> > > > Anirudh.
> > > >
> > > > > Also, kernel consisntency must no depend on use space behavior.
> > > > >
> > > > > > Also, I don't think it is reasonable at all that someone needs to
> > > > > > disable basic kernel functionality such as kexec in order to use our
> > > > > > driver.
> > > > > >
> > > > >
> > > > > It's a temporary measure until proper page lifecycle management is
> > > > > supported in the driver.
> > > > > Mutual exclusion of the driver and kexec is given and thus should be
> > > > > expclitily stated in the Kconfig.
> > > > >
> > > > > Thanks,
> > > > > Stanislav
> > > > >
> > > > > > Thanks,
> > > > > > Anirudh.
> > > > > >
> > > > > > >
> > > > > > > Signed-off-by: Stanislav Kinsburskii <skinsburskii@...ux.microsoft.com>
> > > > > > > ---
> > > > > > > drivers/hv/Kconfig | 1 +
> > > > > > > 1 file changed, 1 insertion(+)
> > > > > > >
> > > > > > > diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
> > > > > > > index 7937ac0cbd0f..cfd4501db0fa 100644
> > > > > > > --- a/drivers/hv/Kconfig
> > > > > > > +++ b/drivers/hv/Kconfig
> > > > > > > @@ -74,6 +74,7 @@ config MSHV_ROOT
> > > > > > > # e.g. When withdrawing memory, the hypervisor gives back 4k pages in
> > > > > > > # no particular order, making it impossible to reassemble larger pages
> > > > > > > depends on PAGE_SIZE_4KB
> > > > > > > + depends on !KEXEC
> > > > > > > select EVENTFD
> > > > > > > select VIRT_XFER_TO_GUEST_WORK
> > > > > > > select HMM_MIRROR
> > > > > > >
> > > > > > >
Powered by blists - more mailing lists