[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <86a51kwbvi.wl-maz@kernel.org>
Date: Tue, 21 Oct 2025 15:46:57 +0100
From: Marc Zyngier <maz@...nel.org>
To: Kunkun Jiang <jiangkunkun@...wei.com>
Cc: Oliver Upton <oliver.upton@...ux.dev>,
Joey Gouly <joey.gouly@....com>,
Suzuki K Poulose <suzuki.poulose@....com>,
Zenghui Yu <yuzenghui@...wei.com>,
Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will@...nel.org>,
"moderated list:KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)"
<linux-arm-kernel@...ts.infradead.org>,
"open list:KERNEL VIRTUAL MACHINE FOR\
ARM64 (KVM/arm64)" <kvmarm@...ts.linux.dev>,
open list
<linux-kernel@...r.kernel.org>,
"wanghaibin.wang@...wei.com"
<wanghaibin.wang@...wei.com>
Subject: Re: [Question] Received vtimer interrupt but ISTATUS is 0
On Tue, 21 Oct 2025 14:38:26 +0100,
Kunkun Jiang <jiangkunkun@...wei.com> wrote:
>
> Hi Marc,
>
> On 2025/10/15 0:32, Marc Zyngier wrote:
> > On Tue, 14 Oct 2025 15:45:37 +0100,
> > Kunkun Jiang <jiangkunkun@...wei.com> wrote:
> >>
> >> Hi all,
> >>
> >> I'm having a very strange problem that can be simplified to a vtimer
> >> interrupt being received but ISTATUS is 0. Why dose this happen?
> >> According to analysis, it may be the timer condition is met and the
> >> interrupt is generated. Maybe some actions(cancel timer?) are done in
> >> the VM, ISTATUS becomes 0 and he hardware needs to clear the
> >> interrupt. But the clear command is sent too slowly, the OS has
> >> already read the ICC_IAR_EL1. So hypervisor executed
> >> kvm_arch_timer_handler but ISTATUS is 0.
> >
> > If what you describe is accurate, and that the HW takes so long to
> > retire the timer interrupt that we cannot trust having taken an
> > interrupt, how long until we can trust that what we have is actually
> > correct?
> >
> > Given that it takes a full exit from the guest before we can handle
> > the interrupt, I am rather puzzled that you observe this sort of bad
> > behaviours on modern HW. You either have an insanely fast CPU with a
> > very slow GIC, or a very bizarre machine (a bit like a ThunderX -- not
> > a compliment).
> I added dump_stack in the exception branch, and the following is the
> stack when the problem occurred.
> > [ 2669.521569] Call trace:
> > [ 2669.521577] dump_backtrace+0x0/0x220
> > [ 2669.521579] show_stack+0x20/0x2c
> > [ 2669.521583] dump_stack+0xf0/0x138
> > [ 2669.521588] kvm_arch_timer_handler+0x138/0x194
> > [ 2669.521592] handle_percpu_devid_irq+0x90/0x1f4
> > [ 2669.521598] __handle_domain_irq+0x84/0xfc
> > [ 2669.521600] gic_handle_irq+0xfc/0x320
> > [ 2669.521601] el1_irq+0xb8/0x140
> > [ 2669.521604] kvm_arch_vcpu_ioctl_run+0x258/0x6fc
> > [ 2669.521607] kvm_vcpu_ioctl+0x334/0xa94
> > [ 2669.521612] __arm64_sys_ioctl+0xb0/0xf4
> > [ 2669.521614] el0_svc_common.constprop.0+0x7c/0x1bc
> > [ 2669.521616] do_el0_svc+0x2c/0xa4
> > [ 2669.521619] el0_svc+0x20/0x30
> > [ 2669.521620] el0_sync_handler+0xb0/0xb4
> > [ 2669.521621] el0_sync+0x160/0x180By analyzing this stack, it should indeed take a full exit from the
> guest.Do you think this is a hardware issue?
Of course this is a HW issue. Your GIC is slow to retire a pending
interrupt, you pay the consequences.
> >
> > How does it work when context-switching from a vcpu that has a pending
> > timer interrupt to one that doesn't? Do you also see spurious
> > interrupts?
> I added a log under the 'if(!vcpu)' branch and tested it, but it did
> not go to this branch. In addition, I have set the vcpu to be bound to
> the core, and only one vcpu is running on one core.
Well, that's hardly testing the conditions I have outlined.
> >
> >> The code flow is as follows:
> >> kvm_arch_timer_handler
> >> ->if (kvm_timer_should_fire)
> >> ->the value of SYS_CNTV_CTL is 0b001(ISTATUS=0,IMASK=0,ENABLE=1)
> >> ->return IRQ_HANDLED
> >>
> >> Because ISTATUS is 0, kvm_timer_update_irq will not be executed to
> >> inject this interrupt into the VM. Since EOImode is 1 and the vtimer
> >> interrupt has IRQD_FORWARDED_TO_VCPU flag, hypervisor will not write
> >> ICC_DIR_EL1 to deactivate the interrupt. This interrupt remains in
> >> active state, blocking subsequent interrupt from being
> >> process. Fortunately, in kvm_timer_vcpu_load it will be determined
> >> again whether an interrupt needs to be injected into the VM. But the
> >> delay will definitely increase.
> >
> > Right, so you are at most a context switch away from your next
> > interrupt, just like in the !vcpu case. While not ideal, that's not
> > fatal.
> >
> >>
> >> What I want to discuss is the solution to this problem. My solution is
> >> to add a deactivation action:
> >> diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> >> index dbd74e4885e2..46baba531d51 100644
> >> --- a/arch/arm64/kvm/arch_timer.c
> >> +++ b/arch/arm64/kvm/arch_timer.c
> >> @@ -228,8 +228,13 @@ static irqreturn_t kvm_arch_timer_handler(int
> >> irq, void *dev_id)
> >> else
> >> ctx = map.direct_ptimer;
> >>
> >> - if (kvm_timer_should_fire(ctx))
> >> + if (kvm_timer_should_fire(ctx)) {
> >> kvm_timer_update_irq(vcpu, true, ctx);
> >> + } else {
> >> + struct vgic_irq *irq;
> >> + irq = vgic_get_vcpu_irq(vcpu, timer_irq(timer_ctx));
> >> + gic_write_dir(irq->hwintid);
> >> + }
> >>
> >> if (userspace_irqchip(vcpu->kvm) &&
> >> !static_branch_unlikely(&has_gic_active_state))
> >>
> >> If you have any new ideas or other solutions to this problem, please
> >> let me know.
> >
> > That's not right.
> >
> > For a start, this is GICv3 specific, and will break on everything
> > else. Also, why the round-trip via the vgic_irq when you already have
> > the interrupt number that has fired *as a parameter*?
> >
> > Finally, this breaks with NV, as you could have switched between EL1
> > and EL2 timers, and since you cannot trust you are in the correct
> > interrupt context (interrupt firing out of context), you can't trust
> > irq->hwintid either, as the mappings will have changed.
> >
> > Something like the patchlet below should do the trick, but I'm
> > definitely not happy about this sort of sorry hacks.
> >
> > M.
> >
> > diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> > index dbd74e4885e24..3db7c6bdffbc0 100644
> > --- a/arch/arm64/kvm/arch_timer.c
> > +++ b/arch/arm64/kvm/arch_timer.c
> > @@ -206,6 +206,13 @@ static void soft_timer_cancel(struct hrtimer *hrt)
> > hrtimer_cancel(hrt);
> > }
> > +static void set_timer_irq_phys_active(struct arch_timer_context
> > *ctx, bool active)
> > +{
> > + int r;
> > + r = irq_set_irqchip_state(ctx->host_timer_irq, IRQCHIP_STATE_ACTIVE, active);
> > + WARN_ON(r);
> > +}
> > +
> > static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
> > {
> > struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;
> > @@ -230,6 +237,8 @@ static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
> > if (kvm_timer_should_fire(ctx))
> > kvm_timer_update_irq(vcpu, true, ctx);
> > + else
> > + set_timer_irq_phys_active(ctx, false);
> > if (userspace_irqchip(vcpu->kvm) &&
> > !static_branch_unlikely(&has_gic_active_state))
> > @@ -659,13 +668,6 @@ static void timer_restore_state(struct arch_timer_context *ctx)
> > local_irq_restore(flags);
> > }
> > -static inline void set_timer_irq_phys_active(struct
> > arch_timer_context *ctx, bool active)
> > -{
> > - int r;
> > - r = irq_set_irqchip_state(ctx->host_timer_irq, IRQCHIP_STATE_ACTIVE, active);
> > - WARN_ON(r);
> > -}
> > -
> > static void kvm_timer_vcpu_load_gic(struct arch_timer_context *ctx)
> > {
> > struct kvm_vcpu *vcpu = ctx->vcpu;
> >
> After extensive testing, this patch was able to resolve the issue I
> encountered.
> Tested-by: Kunkun Jiang <jiangkunkun@...wei.com>
Just to be clear: a similar discussion took place over 5 years ago on
the same subject[1], and I was pretty clear about the conclusion.
There is no bug here. Only a slow HW implementation that leads to
suboptimal behaviours. There is no state loss, no lack of forward
progress, and the interrupts still get delivered in finite time. As
far as I am concerned, things work OK.
Thanks,
M.
[1] https://lore.kernel.org/r/1595584037-6877-1-git-send-email-zhangshaokun@hisilicon.com
--
Without deviation from the norm, progress is not possible.
Powered by blists - more mailing lists