lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <590aa43842aa1e6667d6a564cfdcb86a2ca02160.camel@infradead.org>
Date: Wed, 23 Jul 2025 11:42:26 +0200
From: David Woodhouse <dwmw2@...radead.org>
To: Oliver Upton <oliver.upton@...ux.dev>
Cc: Marc Zyngier <maz@...nel.org>, Joey Gouly <joey.gouly@....com>, Suzuki K
 Poulose <suzuki.poulose@....com>, Zenghui Yu <yuzenghui@...wei.com>,
 Catalin Marinas <catalin.marinas@....com>, Will Deacon <will@...nel.org>,
 Paolo Bonzini <pbonzini@...hat.com>, Sebastian Ott <sebott@...hat.com>,
 Andre Przywara <andre.przywara@....com>, Thorsten Blum
 <thorsten.blum@...ux.dev>, Shameer Kolothum
 <shameerali.kolothum.thodi@...wei.com>,
 linux-arm-kernel@...ts.infradead.org,  kvmarm@...ts.linux.dev,
 linux-kernel@...r.kernel.org, kvm@...r.kernel.org,  "Saidi, Ali"
 <alisaidi@...zon.com>
Subject: Re: [RFC PATCH 2/2] KVM: arm64: vgic-its: Unmap all vPEs on shutdown

On Tue, 2025-07-22 at 15:46 -0700, Oliver Upton wrote:
> On Mon, Jun 23, 2025 at 02:27:14PM +0100, David Woodhouse wrote:
> > From: David Woodhouse <dwmw@...zon.co.uk>
> > 
> > We observed systems going dark on kexec, due to corruption of the
> > new
> > kernel's text (and sometimes the initrd). This was eventually
> > determined
> > to be caused by the vLPI pending tables used by the GIC in the
> > previous
> > kernel, which were not being quiesced properly.
> > 
> > Signed-off-by: David Woodhouse <dwmw@...zon.co.uk>
> > ---
> >  arch/arm64/kvm/arm.c          |  5 +++++
> >  arch/arm64/kvm/vgic/vgic-v3.c | 14 ++++++++++++++
> >  include/kvm/arm_vgic.h        |  2 ++
> >  3 files changed, 21 insertions(+)
> > 
> > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> > index 38a91bb5d4c7..2b76f506bc2d 100644
> > --- a/arch/arm64/kvm/arm.c
> > +++ b/arch/arm64/kvm/arm.c
> > @@ -2164,6 +2164,11 @@ void
> > kvm_arch_disable_virtualization_cpu(void)
> >  		cpu_hyp_uninit(NULL);
> >  }
> >  
> > +void kvm_arch_shutdown(void)
> > +{
> > +	kvm_vgic_v3_shutdown();
> > +}
> > +
> >  #ifdef CONFIG_CPU_PM
> >  static int hyp_init_cpu_pm_notifier(struct notifier_block *self,
> >  				    unsigned long cmd,
> > diff --git a/arch/arm64/kvm/vgic/vgic-v3.c
> > b/arch/arm64/kvm/vgic/vgic-v3.c
> > index b9ad7c42c5b0..6591e8d84855 100644
> > --- a/arch/arm64/kvm/vgic/vgic-v3.c
> > +++ b/arch/arm64/kvm/vgic/vgic-v3.c
> > @@ -382,6 +382,20 @@ static void map_all_vpes(struct kvm *kvm)
> >  						dist-
> > >its_vm.vpes[i]->irq));
> >  }
> >  
> > +void kvm_vgic_v3_shutdown(void)
> > +{
> > +	struct kvm *kvm;
> > +
> > +	if (!kvm_vgic_global_state.has_gicv4_1)
> > +		return;
> > +
> > +	mutex_lock(&kvm_lock);
> > +	list_for_each_entry(kvm, &vm_list, vm_list) {
> > +		unmap_all_vpes(kvm);
> > +	}
> > +	mutex_unlock(&kvm_lock);
> > +}
> > +
> 
> This presumes the vCPUs have already been quiesced which I'm guessing
> is the case for you.

Yeah. With KHO we aspire to be able to do a kexec with some pCPUs
actually still *running* guest vCPUs instead of pointlessly taking them
offline just for *one* pCPU to do the kexec work. But that's a way off
yet, and in that case all these tables will need to be in memory which
persists across the kexec so we won't need to quiesce anything. But
those fantasies are a way off for now...

> The vPEs need to be made nonresident from the
> redistributors prior to unmapping from the ITS to avoid consuming
> unknown vPE state (IHI0069H.b 8.6.2).

Right, I think that's what's being done in the second patch I sent,
saying, "FWIW this is a previous hack we attempted which *didn't work".
To be clear, we do still *have* that hack, in addition to the explicit
unmap_all_vpes() call.

I would love a definitive answer about what the hypervisor is
*expected* to do here. It's very suboptimal that the GIC doesn't
actually stop accessing memory when it is quiesced, and that the GIC
doesn't live behind an IOMMU which would at least allow stray DMA to be
prevented.

> So we'd probably need to deschedule the vPE in
> kvm_arch_disable_virtualization_cpu() along with some awareness of
> 'kvm_rebooting'.

Yeah, I also pondered doing it *all* from there, but it looked like it
would have required some kind of counting to work out when the *last*
CPU was taken down as there's only a per-CPU arch hook. So I didn't
bother with that for the early RFC.

Note that this issue with the GIC's scattershot DMA doesn't only affect
KVM hosts and the vLPI pending tables. We *also* have similar issues on
the guest side with hibernate. The boot kernel sends a MAPD command to
set up an ITT, then transfers control back to the resumed kernel which
had previously set up that ITT at a *different* address, and nobody
ever tells the (v)GIC. Which means that if the host subsequently
serializes that guest for LU/LM, it corrupts memory that the running
kernel didn't expect it to. I guess this would happen for hibernate on
real hardware too? And maybe even kexec but that one just hasn't bitten
us yet?



Download attachment "smime.p7s" of type "application/pkcs7-signature" (5069 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ