lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aGVN8_hmeSKdHGfG@linux.dev>
Date: Wed, 2 Jul 2025 08:19:15 -0700
From: Oliver Upton <oliver.upton@...ux.dev>
To: Steve Sistare <steven.sistare@...cle.com>
Cc: kvmarm@...ts.linux.dev, linux-kernel@...r.kernel.org,
	Marc Zyngier <maz@...nel.org>, Joey Gouly <joey.gouly@....com>,
	Suzuki K Poulose <suzuki.poulose@....com>,
	Zenghui Yu <yuzenghui@...wei.com>
Subject: Re: [PATCH] KVM: arm64: preserve pending during kvm_irqfd_deassign

On Wed, Jul 02, 2025 at 07:41:37AM -0700, Steve Sistare wrote:
> When kvm_irqfd_deassign ... -> kvm_vgic_v4_unset_forwarding is called,
> if an interrupt is pending in irq->pending_latch, then transfer it to
> the producer's eventfd.  This way, if the KVM instance is subsequently
> destroyed, the interrupt is preserved in producer state.  If the irqfd
> is re-created in a new KVM instance, kvm_irqfd_assign finds the producer,
> polls the eventfd, finds the interrupt, and injects it into KVM.
> 
> QEMU live update does that: it passes the VFIO device descriptors to the
> new process, but destroys and recreates the KVM instance, without
> quiescing VFIO interrupts.

This *does not work*. Updates to the ITS mapping are non-atomic and a
poorly timed MSI will get dropped on the floor. Or generate an SError if
your system implementer has a sense of humor.

KVM already provides the SAVE_PENDING_TABLES ioctl for saving the
pending state of LPIs to memory which also retrieves the pending state
of vLPIs from hardware. The expectation for this ioctl is that userspace
has already quiesced the interrupt generator.

If userspace can't honor that I don't see a reason for KVM to go out of
the way to forward the pending state, especially considering the fact
that the architecture doesn't support this behavior.

A spurious interrupt doesn't seem that bad here.

> -int kvm_vgic_v4_unset_forwarding(struct kvm *kvm, int host_irq)
> +int kvm_vgic_v4_unset_forwarding(struct kvm *kvm, int host_irq, bool *pending)
>  {
>  	struct vgic_irq *irq;
>  	unsigned long flags;
>  	int ret = 0;
> +	bool direct_msi = vgic_supports_direct_msis(kvm);
>  
> -	if (!vgic_supports_direct_msis(kvm))
> +	if (!pending && !direct_msi)
>  		return 0;

You've broken the early return in case hardware doesn't support GICv4...

>  	irq = __vgic_host_irq_get_vlpi(kvm, host_irq);
> @@ -542,7 +543,13 @@ int kvm_vgic_v4_unset_forwarding(struct kvm *kvm, int host_irq)
>  
>  	raw_spin_lock_irqsave(&irq->irq_lock, flags);
>  	WARN_ON(irq->hw && irq->host_irq != host_irq);
> -	if (irq->hw) {
> +
> +	if (pending) {
> +		*pending = irq->pending_latch;
> +		irq->pending_latch = false;
> +	}
> +

So this "works" for software-injected LPIs (notice that this function is
for handling *vLPIs*) as KVM's pending state is always the source of
truth. Is that why you're allowing GICv3 to get here now?

This (importantly) doesn't work for hardware vLPIs, as the pending state
is maintained in the vLPI pending table for the vPE.

Overall, I'm not convinced KVM needs to do anything here. We have state
save/restore mechanisms readily available, if userspace wants to go
off-label then it's up to userspace to figure that out.

Thanks,
Oliver

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ