lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e35732dfe5531e4a933cbca37f0d0b7bbaedf515.camel@infradead.org>
Date: Tue, 12 Aug 2025 14:54:53 +0200
From: David Woodhouse <dwmw2@...radead.org>
To: hugo lee <cs.hugolee@...il.com>
Cc: Sean Christopherson <seanjc@...gle.com>, pbonzini@...hat.com, 
 tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
 dave.hansen@...ux.intel.com,  hpa@...or.com, x86@...nel.org,
 kvm@...r.kernel.org, linux-kernel@...r.kernel.org,  Yuguo Li
 <hugoolli@...cent.com>
Subject: Re: [PATCH] KVM: x86: Synchronize APIC State with QEMU when
 irqchip=split

On Tue, 2025-08-12 at 19:50 +0800, hugo lee wrote:
> On Tue, Aug 12, David Woodhouse <dwmw2@...radead.org> wrote:
> > 
> > On Tue, 2025-08-12 at 18:08 +0800, hugo lee wrote:
> > > 
> > > On some legacy bios images using guests, they may disable PIT
> > > after booting.
> > 
> > Do you mean they may *not* disable the PIT after booting? Linux had
> > that problem for a long time, until I fixed it with
> > https://git.kernel.org/torvalds/c/70e6b7d9ae3
> > 
> 
> True, they disabled LINT0 and left PIT unaware.
> 
> > > When irqchip=split is on, qemu will keep kicking the guest and try to
> > > get the Big QEMU Lock.
> > 
> > If it's the PIT, surely QEMU will keep stealing time pointlessly unless
> > we actually disable the PIT itself? Not just the IRQ delivery? Or do
> > you use this to realise that the IRQ output from the PIT isn't going
> > anywhere and thus disable the event in QEMU completely?
> > 
> 
> I'm using this to disable the PIT event in QEMU.
> 
> I'm aiming to solve the desynchronization caused by
> irqchip=split, so the VM will behave more like the
> physical one.

I suspect I'm going to hate your QEMU patch when I see it.

KVM has a callback when the IRQ is acked, which it uses to retrigger
the next interrupt in reinject mode.

Even in !reinject mode, the kvm_pit_ack_irq() callback could just as
easily be used to allow the hrtimer to stop completely until the
interrupt gets acked. Which I understand is basically what you want to
do in QEMU?

There shouldn't be any reason to special-case it on the LINT0 setup; if
the interrupt just remains pending in the PIC and is never serviced,
that should *also* mean we stop wasting steal time on it, right?

So ideally, QEMU would have the same infrastructure to 'resample' an
IRQ when it gets acked. And then it would know when the guest is
ignoring the PIT and it needn't bother to generate any more interrupts.

Except QEMU's interrupt controllers don't yet support that. So for VFIO
INTx interrupts, for example, QEMU unmaps the MMIO BARs of the device
while an interrupt is outstanding, then sends an event to the kernel's
resample irqfd when the guest touches a register therein!

I'd love to see you fix this in QEMU by hooking up that 'resample'
signal when the interrupt is acked in the interrupt controller, and
then wouldn't the kernel side of this and the special case for LINT0 be
unneeded?


Download attachment "smime.p7s" of type "application/pkcs7-signature" (5069 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ