[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f9a37a7d-2141-ee82-c7d6-23d8de9db2c1@huawei.com>
Date: Tue, 21 Oct 2025 21:38:26 +0800
From: Kunkun Jiang <jiangkunkun@...wei.com>
To: Marc Zyngier <maz@...nel.org>
CC: Oliver Upton <oliver.upton@...ux.dev>, Joey Gouly <joey.gouly@....com>,
Suzuki K Poulose <suzuki.poulose@....com>, Zenghui Yu <yuzenghui@...wei.com>,
Catalin Marinas <catalin.marinas@....com>, Will Deacon <will@...nel.org>,
"moderated list:KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)"
<linux-arm-kernel@...ts.infradead.org>, "open list:KERNEL VIRTUAL MACHINE FOR
ARM64 (KVM/arm64)" <kvmarm@...ts.linux.dev>, open list
<linux-kernel@...r.kernel.org>, "wanghaibin.wang@...wei.com"
<wanghaibin.wang@...wei.com>
Subject: Re: [Question] Received vtimer interrupt but ISTATUS is 0
Hi Marc,
On 2025/10/15 0:32, Marc Zyngier wrote:
> On Tue, 14 Oct 2025 15:45:37 +0100,
> Kunkun Jiang <jiangkunkun@...wei.com> wrote:
>>
>> Hi all,
>>
>> I'm having a very strange problem that can be simplified to a vtimer
>> interrupt being received but ISTATUS is 0. Why dose this happen?
>> According to analysis, it may be the timer condition is met and the
>> interrupt is generated. Maybe some actions(cancel timer?) are done in
>> the VM, ISTATUS becomes 0 and he hardware needs to clear the
>> interrupt. But the clear command is sent too slowly, the OS has
>> already read the ICC_IAR_EL1. So hypervisor executed
>> kvm_arch_timer_handler but ISTATUS is 0.
>
> If what you describe is accurate, and that the HW takes so long to
> retire the timer interrupt that we cannot trust having taken an
> interrupt, how long until we can trust that what we have is actually
> correct?
>
> Given that it takes a full exit from the guest before we can handle
> the interrupt, I am rather puzzled that you observe this sort of bad
> behaviours on modern HW. You either have an insanely fast CPU with a
> very slow GIC, or a very bizarre machine (a bit like a ThunderX -- not
> a compliment).
I added dump_stack in the exception branch, and the following is the
stack when the problem occurred.
> [ 2669.521569] Call trace:
> [ 2669.521577] dump_backtrace+0x0/0x220
> [ 2669.521579] show_stack+0x20/0x2c
> [ 2669.521583] dump_stack+0xf0/0x138
> [ 2669.521588] kvm_arch_timer_handler+0x138/0x194
> [ 2669.521592] handle_percpu_devid_irq+0x90/0x1f4
> [ 2669.521598] __handle_domain_irq+0x84/0xfc
> [ 2669.521600] gic_handle_irq+0xfc/0x320
> [ 2669.521601] el1_irq+0xb8/0x140
> [ 2669.521604] kvm_arch_vcpu_ioctl_run+0x258/0x6fc
> [ 2669.521607] kvm_vcpu_ioctl+0x334/0xa94
> [ 2669.521612] __arm64_sys_ioctl+0xb0/0xf4
> [ 2669.521614] el0_svc_common.constprop.0+0x7c/0x1bc
> [ 2669.521616] do_el0_svc+0x2c/0xa4
> [ 2669.521619] el0_svc+0x20/0x30
> [ 2669.521620] el0_sync_handler+0xb0/0xb4
> [ 2669.521621] el0_sync+0x160/0x180By analyzing this stack, it should indeed take a full exit from the
guest.Do you think this is a hardware issue?
>
> How does it work when context-switching from a vcpu that has a pending
> timer interrupt to one that doesn't? Do you also see spurious
> interrupts?
I added a log under the 'if(!vcpu)' branch and tested it, but it did not
go to this branch. In addition, I have set the vcpu to be bound to the
core, and only one vcpu is running on one core.
>
>> The code flow is as follows:
>> kvm_arch_timer_handler
>> ->if (kvm_timer_should_fire)
>> ->the value of SYS_CNTV_CTL is 0b001(ISTATUS=0,IMASK=0,ENABLE=1)
>> ->return IRQ_HANDLED
>>
>> Because ISTATUS is 0, kvm_timer_update_irq will not be executed to
>> inject this interrupt into the VM. Since EOImode is 1 and the vtimer
>> interrupt has IRQD_FORWARDED_TO_VCPU flag, hypervisor will not write
>> ICC_DIR_EL1 to deactivate the interrupt. This interrupt remains in
>> active state, blocking subsequent interrupt from being
>> process. Fortunately, in kvm_timer_vcpu_load it will be determined
>> again whether an interrupt needs to be injected into the VM. But the
>> delay will definitely increase.
>
> Right, so you are at most a context switch away from your next
> interrupt, just like in the !vcpu case. While not ideal, that's not
> fatal.
>
>>
>> What I want to discuss is the solution to this problem. My solution is
>> to add a deactivation action:
>> diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
>> index dbd74e4885e2..46baba531d51 100644
>> --- a/arch/arm64/kvm/arch_timer.c
>> +++ b/arch/arm64/kvm/arch_timer.c
>> @@ -228,8 +228,13 @@ static irqreturn_t kvm_arch_timer_handler(int
>> irq, void *dev_id)
>> else
>> ctx = map.direct_ptimer;
>>
>> - if (kvm_timer_should_fire(ctx))
>> + if (kvm_timer_should_fire(ctx)) {
>> kvm_timer_update_irq(vcpu, true, ctx);
>> + } else {
>> + struct vgic_irq *irq;
>> + irq = vgic_get_vcpu_irq(vcpu, timer_irq(timer_ctx));
>> + gic_write_dir(irq->hwintid);
>> + }
>>
>> if (userspace_irqchip(vcpu->kvm) &&
>> !static_branch_unlikely(&has_gic_active_state))
>>
>> If you have any new ideas or other solutions to this problem, please
>> let me know.
>
> That's not right.
>
> For a start, this is GICv3 specific, and will break on everything
> else. Also, why the round-trip via the vgic_irq when you already have
> the interrupt number that has fired *as a parameter*?
>
> Finally, this breaks with NV, as you could have switched between EL1
> and EL2 timers, and since you cannot trust you are in the correct
> interrupt context (interrupt firing out of context), you can't trust
> irq->hwintid either, as the mappings will have changed.
>
> Something like the patchlet below should do the trick, but I'm
> definitely not happy about this sort of sorry hacks.
>
> M.
>
> diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> index dbd74e4885e24..3db7c6bdffbc0 100644
> --- a/arch/arm64/kvm/arch_timer.c
> +++ b/arch/arm64/kvm/arch_timer.c
> @@ -206,6 +206,13 @@ static void soft_timer_cancel(struct hrtimer *hrt)
> hrtimer_cancel(hrt);
> }
>
> +static void set_timer_irq_phys_active(struct arch_timer_context *ctx, bool active)
> +{
> + int r;
> + r = irq_set_irqchip_state(ctx->host_timer_irq, IRQCHIP_STATE_ACTIVE, active);
> + WARN_ON(r);
> +}
> +
> static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
> {
> struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;
> @@ -230,6 +237,8 @@ static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
>
> if (kvm_timer_should_fire(ctx))
> kvm_timer_update_irq(vcpu, true, ctx);
> + else
> + set_timer_irq_phys_active(ctx, false);
>
> if (userspace_irqchip(vcpu->kvm) &&
> !static_branch_unlikely(&has_gic_active_state))
> @@ -659,13 +668,6 @@ static void timer_restore_state(struct arch_timer_context *ctx)
> local_irq_restore(flags);
> }
>
> -static inline void set_timer_irq_phys_active(struct arch_timer_context *ctx, bool active)
> -{
> - int r;
> - r = irq_set_irqchip_state(ctx->host_timer_irq, IRQCHIP_STATE_ACTIVE, active);
> - WARN_ON(r);
> -}
> -
> static void kvm_timer_vcpu_load_gic(struct arch_timer_context *ctx)
> {
> struct kvm_vcpu *vcpu = ctx->vcpu;
>
After extensive testing, this patch was able to resolve the issue I
encountered.
Tested-by: Kunkun Jiang <jiangkunkun@...wei.com>
Thanks,
Kunkun Jiang
Powered by blists - more mailing lists