lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <dcc29d43904f4d26fea25dbdf8a86a2bae1087a9.camel@infradead.org>
Date: Wed, 10 Sep 2025 10:39:03 +0100
From: David Woodhouse <dwmw2@...radead.org>
To: Vitaly Kuznetsov <vkuznets@...hat.com>, Khushit Shah
	 <khushit.shah@...anix.com>
Cc: "seanjc@...gle.com" <seanjc@...gle.com>, "pbonzini@...hat.com"
	 <pbonzini@...hat.com>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>, 
 "linux-kernel@...r.kernel.org"
	 <linux-kernel@...r.kernel.org>, Shaju Abraham <shaju.abraham@...anix.com>
Subject: Re: [BUG] [KVM/VMX] Level triggered interrupts mishandled on
 Windows w/ nested virt(Credential Guard) when using split irqchip

On Wed, 2025-09-10 at 10:34 +0200, Vitaly Kuznetsov wrote:
> Khushit Shah <khushit.shah@...anix.com> writes:
> 
> > > On 8 Sep 2025, at 5:12 PM, Vitaly Kuznetsov <vkuznets@...hat.com> wrote:
> > > 
> 
> ...
> 
> > > Also, I've just recalled I fixed (well, 'workarounded') an issue
> > > similar
> > > to yours a while ago in QEMU:
> > > 
> > > commit 958a01dab8e02fc49f4fd619fad8c82a1108afdb
> > > Author: Vitaly Kuznetsov <vkuznets@...hat.com>
> > > Date:   Tue Apr 2 10:02:15 2019 +0200
> > > 
> > >    ioapic: allow buggy guests mishandling level-triggered
> > > interrupts to make progress
> > > 
> > > maybe something has changed and it doesn't work anymore?
> > 
> > This is really interesting, we are facing a very similar issue, but
> > the interrupt storm only occurs when using split-irqchip. 
> > Using kernel-irqchip, we do not even see consecutive level
> > triggered interrupts of the same vector. From the logs it is 
> > clear that somehow with kernel-irqchip, L1 passes the interrupt to
> > L2 to service, but with split-irqchip, L1 EOI’s without 
> > servicing the interrupt. As it is working properly on kernel-
> > irqchip, we can’t really point it as an Hyper-V issue. AFAIK, 
> > kernel-irqchip setting should be transparent to the guest, can you
> > think of anything that can change this?
> 
> The problem I've fixed back then was also only visible with split
> irqchip. The reason was:
> 
> """
> in-kernel IOAPIC implementation has commit 184564efae4d ("kvm:
> ioapic: conditionally delay
> irq delivery duringeoi broadcast")
> """
> 
> so even though the guest cannot really distinguish between in-kernel
> and
> split irqchips, the small differences in implementation can make a
> big
> difference in the observed behavior. In case we re-assert improperly
> handled level-triggered interrupt too fast, the guest is not able to
> make much progress but if we let it execute for even the tiniest
> fraction of time, then the forward progress happens. 
> 
> I don't exactly know what happens in this particular case but I'd
> suggest you try to atrificially delay re-asserting level triggered
> interrupts and see what happens.

We know that QEMU reasserts INTx interrupts too soon anyway.

The in-kernel irqchip will trigger the VFIO resamplefd when the
interrupt is EOI'd in the I/O APIC. as $DEITY intended.

QEMU, on the other hand, will unmap the device BARs when the interrupt
happens and intercept subsequent access, triggering the VFIO resamplefd
as soon as the next access happens — even before it's EOI'd.

Could that be making a difference here?

I guess, in theory, "too soon" probably shouldn't matter if it's all
handled correctly elsewhere — it should get masked again in the
hardware and the pending status tracked correctly until it's
redelivered to the guest(s). But it's probably worth testing, given
that's one of the big behavioural differences between kernel and
userspace I/O APIC?

It's somewhat non-trivial to fix it 'properly' across all of QEMU's
interrupt controllers and IRQ abstractions, but hacking something up
which does the right thing just for this x86 platform and I/O APIC and
avoids the current MMIO-unmapping abomination might be worth a test?

Download attachment "smime.p7s" of type "application/pkcs7-signature" (5069 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ