linux-kernel - Re: [PATCH] KVM: arm64: preserve pending during kvm_irqfd

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aGVN8_hmeSKdHGfG@linux.dev>
Date: Wed, 2 Jul 2025 08:19:15 -0700
From: Oliver Upton <oliver.upton@...ux.dev>
To: Steve Sistare <steven.sistare@...cle.com>
Cc: kvmarm@...ts.linux.dev, linux-kernel@...r.kernel.org,
	Marc Zyngier <maz@...nel.org>, Joey Gouly <joey.gouly@....com>,
	Suzuki K Poulose <suzuki.poulose@....com>,
	Zenghui Yu <yuzenghui@...wei.com>
Subject: Re: [PATCH] KVM: arm64: preserve pending during kvm_irqfd_deassign

On Wed, Jul 02, 2025 at 07:41:37AM -0700, Steve Sistare wrote:
> When kvm_irqfd_deassign ... -> kvm_vgic_v4_unset_forwarding is called,
> if an interrupt is pending in irq->pending_latch, then transfer it to
> the producer's eventfd.  This way, if the KVM instance is subsequently
> destroyed, the interrupt is preserved in producer state.  If the irqfd
> is re-created in a new KVM instance, kvm_irqfd_assign finds the producer,
> polls the eventfd, finds the interrupt, and injects it into KVM.
> 
> QEMU live update does that: it passes the VFIO device descriptors to the
> new process, but destroys and recreates the KVM instance, without
> quiescing VFIO interrupts.

This *does not work*. Updates to the ITS mapping are non-atomic and a
poorly timed MSI will get dropped on the floor. Or generate an SError if
your system implementer has a sense of humor.

KVM already provides the SAVE_PENDING_TABLES ioctl for saving the
pending state of LPIs to memory which also retrieves the pending state
of vLPIs from hardware. The expectation for this ioctl is that userspace
has already quiesced the interrupt generator.

If userspace can't honor that I don't see a reason for KVM to go out of
the way to forward the pending state, especially considering the fact
that the architecture doesn't support this behavior.

A spurious interrupt doesn't seem that bad here.

> -int kvm_vgic_v4_unset_forwarding(struct kvm *kvm, int host_irq)
> +int kvm_vgic_v4_unset_forwarding(struct kvm *kvm, int host_irq, bool *pending)
>  {
>  	struct vgic_irq *irq;
>  	unsigned long flags;
>  	int ret = 0;
> +	bool direct_msi = vgic_supports_direct_msis(kvm);
>  
> -	if (!vgic_supports_direct_msis(kvm))
> +	if (!pending && !direct_msi)
>  		return 0;

You've broken the early return in case hardware doesn't support GICv4...

>  	irq = __vgic_host_irq_get_vlpi(kvm, host_irq);
> @@ -542,7 +543,13 @@ int kvm_vgic_v4_unset_forwarding(struct kvm *kvm, int host_irq)
>  
>  	raw_spin_lock_irqsave(&irq->irq_lock, flags);
>  	WARN_ON(irq->hw && irq->host_irq != host_irq);
> -	if (irq->hw) {
> +
> +	if (pending) {
> +		*pending = irq->pending_latch;
> +		irq->pending_latch = false;
> +	}
> +

So this "works" for software-injected LPIs (notice that this function is
for handling *vLPIs*) as KVM's pending state is always the source of
truth. Is that why you're allowing GICv3 to get here now?

This (importantly) doesn't work for hardware vLPIs, as the pending state
is maintained in the vLPI pending table for the vPE.

Overall, I'm not convinced KVM needs to do anything here. We have state
save/restore mechanisms readily available, if userspace wants to go
off-label then it's up to userspace to figure that out.

Thanks,
Oliver