[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID:
<SN6PR02MB4157D37EF081A7CC2E70425BD438A@SN6PR02MB4157.namprd02.prod.outlook.com>
Date: Wed, 27 Aug 2025 16:01:55 +0000
From: Michael Kelley <mhklinux@...look.com>
To: Vitaly Kuznetsov <vkuznets@...hat.com>, "linux-hyperv@...r.kernel.org"
<linux-hyperv@...r.kernel.org>
CC: "K. Y. Srinivasan" <kys@...rosoft.com>, Haiyang Zhang
<haiyangz@...rosoft.com>, Wei Liu <wei.liu@...nel.org>, Dexuan Cui
<decui@...rosoft.com>, "x86@...nel.org" <x86@...nel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Nuno Das Neves
<nunodasneves@...ux.microsoft.com>, Tianyu Lan <tiala@...rosoft.com>, Li Tian
<litian@...hat.com>, Philipp Rudo <prudo@...hat.com>
Subject: RE: [PATCH v3] x86/hyperv: Fix kdump on Azure CVMs
From: Vitaly Kuznetsov <vkuznets@...hat.com> Sent: Wednesday, August 27, 2025 5:51 AM
>
> Michael Kelley <mhklinux@...look.com> writes:
>
> > From: Vitaly Kuznetsov <vkuznets@...hat.com> Sent: Thursday, August 21, 2025 8:17 AM
> >>
[snip]
> >
> > I seem to recall that some separate work has been done
> > to support kexec/kdump for the more generic SEV-SNP and
> > TDX cases where there's no paravisor mediating. I haven't
> > gone looking for that code to see when it runs.
> > hv_ivm_clear_host_access() is needed to update the
> > paravisor records about the page state, but if other code
> > has already updated the hypervisor/processor state, that
> > might be problematic. If this runs first, then the more
> > generic code will presumably find nothing to do, which
> > should be OK.
> >
> > I'll try to go look further at this situation, unless you already
> > have. If necessary, this function could be gated to run
> > only when a paravisor is present.
>
> Yes, there are SEV-SNP and TDX specific
> snp_kexec_finish()/tdx_kexec_finish() which do memory unsharing but I
> convinced myself that these are not called on Azure CVM which uses
> paravisor. In particular, for SEV-SNP 'sme_me_mask == 0' in
> sme_early_init().
Thanks for the pointer to snp_kexec_finish() and tdx_kexec_finish().
Yes, I agree they won't get called in the Hyper-V paravisor case.
>
> I have to admit I've never seen an Azure/Hyper-V CVM without a
> paravisor, but I agree it may make sense to hide all this tracking and
> cleanup logic under 'if (hyperv_paravisor_present)' (or even 'if
> (hv_is_isolation_supported() && hyperv_paravisor_present)').
I think this patch should be plugged into the .enc_kexec_begin and
.enc_kexec_finish mechanism. By using .enc_kexec_begin similarly
to snp/tdx_kexec_begin(), synchronizing with in-progress private<->shared
conversions is done in the kexec() case, though not in the panic/kdump
case. That machinery is there, and the Hyper-V paravisor case can use it.
hv_ivm_clear_host_access() should be the .enc_kexec_finish function,
and it will get invoked at the right time without having to be
explicitly called in hyperv_cleanup().
Hyper-V *does* support running SNP and TDX in what Microsoft
internally calls the "fully enlightened" case, where the guest OS does
everything necessary to operate in SNP or TDX mode, without
depending on a paravisor. In such a case, sme_me_mask will be
non-zero in SNP mode, for example. But I'm unsure if Microsoft has kept
the Linux guest support on Hyper-V up-to-date enough for this to actually
work in practice. I thought at one point there was an internal use case
for SNP without a paravisor, but it's been nearly two years now
since I retired, and so I don't really know anymore. For running on
Hyper-V in TDX mode without a paravisor, I know there's one Linux
patch missing that is needed by the netvsc driver. That patch would
provide the TDX support for set_memory_encrypted/decrypted() to work
on a vmalloc area where the underlying physical memory is not
contiguous. SNP has the support, but it's a known gap for TDX.
If you wire up the .enc_kexec_begin and .enc_kexec_finish functions
in hv_vtom_init() along with the other x86_platform.guest.enc_*
functions, then you don't need to test for a paravisor being present
because hv_vtom_init() is called only when a paravisor is present.
The non-paravisor cases on Hyper-V will fall back to the existing
snp/tdx_kexec_being/finish() functions, and everything *should*
just work. If it doesn't just work, that's a different problem for
the Microsoft folks to look at if they care.
I will ping my contacts on the Microsoft side to see if the "no
paravisor" case is still of interest, and if so, whether someone
can test it. My only test environment is as a normal Azure user,
so like you, I don't have a way to do such a test.
>
> >
> >> +
> >> + raw_spin_lock_irqsave(&hv_list_enc_lock, flags);
> >
> > Since this function is now called after other CPUs have
> > been stopped, the spin lock is no longer necessary, unless
> > you were counting on it to provide the interrupt disable
> > needed for accessing the per-cpu hypercall argument page.
> > But even then, I'd suggest just doing the interrupt disable
> > instead of the spin lock so there's no chance of the
> > panic or kexec path getting hung waiting on the spin lock.
>
> Makes sense, will do.
>
> >
> > There's also a potentially rare problem if other CPUs are
> > stopped while hv_list_enc_add() or hv_list_nec_remove()
> > is being executed. The list might be inconsistent, or not
> > fully reflect what the paravisor and hypervisor think about
> > the private/shared state of the page. But I don't think there's
> > anything we can do about that. Again, I'd suggest a code
> > comment acknowledging this case, and that there's nothing
> > that can be done.
>
> True, will add a comment. Just like with a lot of other corner cases in
> panic, it's hard to guarantee correctness in ALL cases as the system can
> be in any state (e.g. if the panic is caused by memory corruption -- who
> knows what's corrupted?). I'm hoping that with the newly added logic
> we're covering the most common kdump case and it'll 'generally work' on
> Azure CVMs.
As noted above, in a non-panic kexec(), the synchronization with
in-progress private<->shared conversions can be solved.
Michael
Powered by blists - more mailing lists