[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bb4984d2-ed80-ba97-0f58-b45db4f0ff32@huawei.com>
Date: Tue, 19 Dec 2023 15:24:55 +0800
From: Kunkun Jiang <jiangkunkun@...wei.com>
To: Marc Zyngier <maz@...nel.org>, Oliver Upton <oliver.upton@...ux.dev>
CC: James Morse <james.morse@....com>, Suzuki K Poulose
<suzuki.poulose@....com>, Zenghui Yu <yuzenghui@...wei.com>, Catalin Marinas
<catalin.marinas@....com>, Will Deacon <will@...nel.org>, Jean-Philippe
Brucker <jean-philippe@...aro.org>, "moderated list:ARM SMMU DRIVERS"
<linux-arm-kernel@...ts.infradead.org>, <kvmarm@...ts.linux.dev>, open list
<linux-kernel@...r.kernel.org>, "wanghaibin.wang@...wei.com"
<wanghaibin.wang@...wei.com>
Subject: Re: [bug report] GICv4.1: vSGI remains pending across the guest reset
Hi Marc and Oliver,
On 2023/12/17 19:26, Marc Zyngier wrote:
> On Thu, 14 Dec 2023 12:13:57 +0000,
> Kunkun Jiang <jiangkunkun@...wei.com> wrote:
>> Hi list,
>>
>> We have observed on GICv4.1 systems that, after a guest reset, the
>> secondary VCPU would receive an IPI_CPU_STOP accidently and failed to
>> come online eventually.
>>
>> | Guest User-space
> Guest userspace????
Sorry. The formatting is a bit messy.
| Guest User-space
|
| reset (with a vSGI pending in the
| hardware) [0]
|
| disable the distributor (write 0
| into GICD_CTLR) [1]
|
| clear pending status (write 0 into
| GICR_ISPENDR0 for each RD) [2]
|
| disable the distributor (write 0
| into GICD_CTLR) [3]
|
| enable the distributor with ARE,
| Group1 and nASSGIreq [4]
>> |
>> | reset (with a vSGI pending in the
>> | hardware) [0]
>> |
>> | disable the distributor (write 0
>> | into GICD_CTLR) [1]
>> |
>> | clear pending status (write 0 into
>> | GICR_ISPENDR0 for each RD) [2]
> This cannot clear the pending bits. Writing 0 to any of the
> ISPEND/ICPEND registers is effectively a NOP.
>
> If you want to remove the pending SGIs, you need to write a bunch of
> 1s to ICPENDR0.
>
>> |
>> | disable the distributor (write 0
>> | into GICD_CTLR) [3]
>> |
>> | enable the distributor with ARE,
>> | Group1 and nASSGIreq [4]
>>
>> The problem is that even if user-space tries to disable the distributor
> I don't understand what userspace does here. Why is it significant
> that it is an EL0 access? I don't know of any SW doing that, but even
> if it existed, this should make no difference.
>
>> and clear pending bits for all SGIs, we don't actually propogate it into
>> HW (we only record it via vgic_dist->{enabled, nassgireq} and
>> vgic_irq->pending_latch) and the vSGI remains pending across the guest
>> reset.
>>
>> And when we're at [4], all vSGI's vgic_irq->hw are *true* and
>> vgic_v4_enable_vsgis() becomes a nop.. That's not good.
>>
>> The following solution can solve the problem. Not sure if this is a good
>> solution.Sent out early for suggestions or solutions.
>>
>> diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>> b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>> index 89117ba2528a..3678ab33f5b9 100644
>> --- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>> +++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
>> @@ -374,6 +374,10 @@ static int vgic_v3_uaccess_write_pending(struct
>> kvm_vcpu *vcpu,
>> irq->pending_latch = true;
>> vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
>> } else {
>> + if (irq->hw && vgic_irq_is_sgi(irq->intid))
>> + irq_set_irqchip_state(irq->host_irq,
>> + IRQCHIP_STATE_PENDING,
>> + false);
>> irq->pending_latch = false;
>> raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
>> }
>>
> But this has *nothing* to do with the guest. This is the *host*
> userspace performing a write to the redistributor view, which has
> different semantics. Which is why your earlier description made no
> sense to me.
>
> I think the problem is slightly larger than what you describe. A write
> to ISPENDR0 should be propagated to the ITS for any values of the
> latch, just like this happens on enabling HW-backed SGIs.
>
> Can you please give this a go?
I have given it a go. I tested reseting the guest 100 times in succession,
and the problem did not occur again.
Thanks,
Kunkun Jiang
> Thanks,
>
> M.
>
> >From 9932d74392d969057e84bc3c18bd15f1769b6211 Mon Sep 17 00:00:00 2001
> From: Marc Zyngier <maz@...nel.org>
> Date: Sun, 17 Dec 2023 11:15:09 +0000
> Subject: [PATCH] KVM: arm64: vgic-v4: Restore pending state on host userspace
> write
>
> When the VMM writes to ISPENDR0 to set the state pending state of
> an SGI, we fail to convey this to the HW if this SGI is already
> backed by a GICv4.1 vSGI.
>
> This is a bit of a corner case, as this would only occur if the
> vgic state is changed on an already running VM, but this can
> apparently happen across a guest reset driven by the VMM.
>
> Fix this by always writing out the pending_latch value to the
> HW, and reseting it to false.
>
> Reported-by: Kunkun Jiang <jiangkunkun@...wei.com>
> Signed-off-by: Marc Zyngier <maz@...nel.org>
> Link: https://lore.kernel.org/r/7e7f2c0c-448b-10a9-8929-4b8f4f6e2a32@huawei.com
> ---
> arch/arm64/kvm/vgic/vgic-mmio-v3.c | 27 +++++++++++++++++----------
> 1 file changed, 17 insertions(+), 10 deletions(-)
>
> diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> index a764b0ab8bf9..2533f264b954 100644
> --- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> +++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> @@ -365,19 +365,26 @@ static int vgic_v3_uaccess_write_pending(struct kvm_vcpu *vcpu,
> struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
>
> raw_spin_lock_irqsave(&irq->irq_lock, flags);
> - if (test_bit(i, &val)) {
> - /*
> - * pending_latch is set irrespective of irq type
> - * (level or edge) to avoid dependency that VM should
> - * restore irq config before pending info.
> - */
> - irq->pending_latch = true;
> - vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
> - } else {
> +
> + /*
> + * pending_latch is set irrespective of irq type
> + * (level or edge) to avoid dependency that VM should
> + * restore irq config before pending info.
> + */
> + irq->pending_latch = test_bit(i, &val);
> +
> + if (irq->hw && vgic_irq_is_sgi(irq->intid)) {
> + irq_set_irqchip_state(irq->host_irq,
> + IRQCHIP_STATE_PENDING,
> + irq->pending_latch);
> irq->pending_latch = false;
> - raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
> }
>
> + if (irq->pending_latch)
> + vgic_queue_irq_unlock(vcpu->kvm, irq, flags);
> + else
> + raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
> +
> vgic_put_irq(vcpu->kvm, irq);
> }
>
Powered by blists - more mailing lists