linux-kernel - Re: [Question] Received vtimer interrupt but ISTATUS is 0

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f9a37a7d-2141-ee82-c7d6-23d8de9db2c1@huawei.com>
Date: Tue, 21 Oct 2025 21:38:26 +0800
From: Kunkun Jiang <jiangkunkun@...wei.com>
To: Marc Zyngier <maz@...nel.org>
CC: Oliver Upton <oliver.upton@...ux.dev>, Joey Gouly <joey.gouly@....com>,
	Suzuki K Poulose <suzuki.poulose@....com>, Zenghui Yu <yuzenghui@...wei.com>,
	Catalin Marinas <catalin.marinas@....com>, Will Deacon <will@...nel.org>,
	"moderated list:KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)"
	<linux-arm-kernel@...ts.infradead.org>, "open list:KERNEL VIRTUAL MACHINE FOR
 ARM64 (KVM/arm64)" <kvmarm@...ts.linux.dev>, open list
	<linux-kernel@...r.kernel.org>, "wanghaibin.wang@...wei.com"
	<wanghaibin.wang@...wei.com>
Subject: Re: [Question] Received vtimer interrupt but ISTATUS is 0

Hi Marc,

On 2025/10/15 0:32, Marc Zyngier wrote:
> On Tue, 14 Oct 2025 15:45:37 +0100,
> Kunkun Jiang <jiangkunkun@...wei.com> wrote:
>>
>> Hi all,
>>
>> I'm having a very strange problem that can be simplified to a vtimer
>> interrupt being received but ISTATUS is 0. Why dose this happen?
>> According to analysis, it may be the timer condition is met and the
>> interrupt is generated. Maybe some actions(cancel timer?) are done in
>> the VM, ISTATUS becomes 0 and he hardware needs to clear the
>> interrupt. But the clear command is sent too slowly, the OS has
>> already read the ICC_IAR_EL1. So hypervisor executed
>> kvm_arch_timer_handler but ISTATUS is 0.
> 
> If what you describe is accurate, and that the HW takes so long to
> retire the timer interrupt that we cannot trust having taken an
> interrupt, how long until we can trust that what we have is actually
> correct?
> 
> Given that it takes a full exit from the guest before we can handle
> the interrupt, I am rather puzzled that you observe this sort of bad
> behaviours on modern HW. You either have an insanely fast CPU with a
> very slow GIC, or a very bizarre machine (a bit like a ThunderX -- not
> a compliment).
I added dump_stack in the exception branch, and the following is the 
stack when the problem occurred.
> [ 2669.521569] Call trace:
> [ 2669.521577]  dump_backtrace+0x0/0x220
> [ 2669.521579]  show_stack+0x20/0x2c
> [ 2669.521583]  dump_stack+0xf0/0x138
> [ 2669.521588]  kvm_arch_timer_handler+0x138/0x194
> [ 2669.521592]  handle_percpu_devid_irq+0x90/0x1f4
> [ 2669.521598]  __handle_domain_irq+0x84/0xfc
> [ 2669.521600]  gic_handle_irq+0xfc/0x320
> [ 2669.521601]  el1_irq+0xb8/0x140
> [ 2669.521604]  kvm_arch_vcpu_ioctl_run+0x258/0x6fc
> [ 2669.521607]  kvm_vcpu_ioctl+0x334/0xa94
> [ 2669.521612]  __arm64_sys_ioctl+0xb0/0xf4
> [ 2669.521614]  el0_svc_common.constprop.0+0x7c/0x1bc
> [ 2669.521616]  do_el0_svc+0x2c/0xa4
> [ 2669.521619]  el0_svc+0x20/0x30
> [ 2669.521620]  el0_sync_handler+0xb0/0xb4
> [ 2669.521621]  el0_sync+0x160/0x180By analyzing this stack, it should indeed take a full exit from the 
guest.Do you think this is a hardware issue?
> 
> How does it work when context-switching from a vcpu that has a pending
> timer interrupt to one that doesn't? Do you also see spurious
> interrupts?
I added a log under the 'if(!vcpu)' branch and tested it, but it did not 
go to this branch. In addition, I have set the vcpu to be bound to the 
core, and only one vcpu is running on one core.
> 
>> The code flow is as follows:
>> kvm_arch_timer_handler
>>      ->if (kvm_timer_should_fire)
>>          ->the value of SYS_CNTV_CTL is 0b001(ISTATUS=0,IMASK=0,ENABLE=1)
>>      ->return IRQ_HANDLED
>>
>> Because ISTATUS is 0, kvm_timer_update_irq will not be executed to
>> inject this interrupt into the VM. Since EOImode is 1 and the vtimer
>> interrupt has IRQD_FORWARDED_TO_VCPU flag, hypervisor will not write
>> ICC_DIR_EL1 to deactivate the interrupt. This interrupt remains in
>> active state, blocking subsequent interrupt from being
>> process. Fortunately, in kvm_timer_vcpu_load it will be determined
>> again whether an interrupt needs to be injected into the VM. But the
>> delay will definitely increase.
> 
> Right, so you are at most a context switch away from your next
> interrupt, just like in the !vcpu case. While not ideal, that's not
> fatal.
> 
>>
>> What I want to discuss is the solution to this problem. My solution is
>> to add a deactivation action:
>> diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
>> index dbd74e4885e2..46baba531d51 100644
>> --- a/arch/arm64/kvm/arch_timer.c
>> +++ b/arch/arm64/kvm/arch_timer.c
>> @@ -228,8 +228,13 @@ static irqreturn_t kvm_arch_timer_handler(int
>> irq, void *dev_id)
>>          else
>>                  ctx = map.direct_ptimer;
>>
>> -       if (kvm_timer_should_fire(ctx))
>> +       if (kvm_timer_should_fire(ctx)) {
>>                  kvm_timer_update_irq(vcpu, true, ctx);
>> +       } else {
>> +               struct vgic_irq *irq;
>> +               irq = vgic_get_vcpu_irq(vcpu, timer_irq(timer_ctx));
>> +               gic_write_dir(irq->hwintid);
>> +       }
>>
>>          if (userspace_irqchip(vcpu->kvm) &&
>>              !static_branch_unlikely(&has_gic_active_state))
>>
>> If you have any new ideas or other solutions to this problem, please
>> let me know.
> 
> That's not right.
> 
> For a start, this is GICv3 specific, and will break on everything
> else. Also, why the round-trip via the vgic_irq when you already have
> the interrupt number that has fired *as a parameter*?
> 
> Finally, this breaks with NV, as you could have switched between EL1
> and EL2 timers, and since you cannot trust you are in the correct
> interrupt context (interrupt firing out of context), you can't trust
> irq->hwintid either, as the mappings will have changed.
> 
> Something like the patchlet below should do the trick, but I'm
> definitely not happy about this sort of sorry hacks.
> 
> 	M.
> 
> diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> index dbd74e4885e24..3db7c6bdffbc0 100644
> --- a/arch/arm64/kvm/arch_timer.c
> +++ b/arch/arm64/kvm/arch_timer.c
> @@ -206,6 +206,13 @@ static void soft_timer_cancel(struct hrtimer *hrt)
>   	hrtimer_cancel(hrt);
>   }
>   
> +static void set_timer_irq_phys_active(struct arch_timer_context *ctx, bool active)
> +{
> +	int r;
> +	r = irq_set_irqchip_state(ctx->host_timer_irq, IRQCHIP_STATE_ACTIVE, active);
> +	WARN_ON(r);
> +}
> +
>   static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
>   {
>   	struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;
> @@ -230,6 +237,8 @@ static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
>   
>   	if (kvm_timer_should_fire(ctx))
>   		kvm_timer_update_irq(vcpu, true, ctx);
> +	else
> +		set_timer_irq_phys_active(ctx, false);
>   
>   	if (userspace_irqchip(vcpu->kvm) &&
>   	    !static_branch_unlikely(&has_gic_active_state))
> @@ -659,13 +668,6 @@ static void timer_restore_state(struct arch_timer_context *ctx)
>   	local_irq_restore(flags);
>   }
>   
> -static inline void set_timer_irq_phys_active(struct arch_timer_context *ctx, bool active)
> -{
> -	int r;
> -	r = irq_set_irqchip_state(ctx->host_timer_irq, IRQCHIP_STATE_ACTIVE, active);
> -	WARN_ON(r);
> -}
> -
>   static void kvm_timer_vcpu_load_gic(struct arch_timer_context *ctx)
>   {
>   	struct kvm_vcpu *vcpu = ctx->vcpu;
> 
After extensive testing, this patch was able to resolve the issue I 
encountered.
Tested-by: Kunkun Jiang <jiangkunkun@...wei.com>

Thanks,
Kunkun Jiang