linux-kernel - Re: [PATCH 1/1] KVM: arm64: Affinity level 3 support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <86frxg3i6u.wl-maz@kernel.org>
Date: Sun, 25 Feb 2024 11:19:05 +0000
From: Marc Zyngier <maz@...nel.org>
To: Wei-Lin Chang <r09922117@...e.ntu.edu.tw>
Cc: oliver.upton@...ux.dev,
	james.morse@....com,
	suzuki.poulose@....com,
	yuzenghui@...wei.com,
	catalin.marinas@....com,
	will@...nel.org,
	linux-arm-kernel@...ts.infradead.org,
	kvmarm@...ts.linux.dev,
	linux-kernel@...r.kernel.org,
	sauravsc@...zon.com, Eric Auger <eric.auger@...hat.com>
Subject: Re: [PATCH 1/1] KVM: arm64: Affinity level 3 support

[+Eric who was looking into something related recently]

Hi Wei-Lin,

Thanks for looking into this.

On Sun, 25 Feb 2024 09:02:37 +0000,
Wei-Lin Chang <r09922117@...e.ntu.edu.tw> wrote:
> 
> Currently, KVM ARM64 avoids using the Aff3 field for VCPUs, which saves
> us from having to check for hardware support in ICH_VTR_EL2.A3V or the

That's not strictly true. We do check for A3V support at restore time.

> guest's execution state. However a VCPU could still have its Aff3 bits
> set to non-zero if the VMM directly changes the VCPU's MPIDR_EL1. This
> causes a mismatch between MPIDR_EL1.Aff3 and GICR_TYPER[63:56] since 0s
> are always returned for the latter, failing the GIC Redistributor
> matching in the VM.
> 
> Let's fix this by only allowing userspace to write into the Aff3 field
> of MPIDR_EL1 if Aff3 is valid.

What does "valid" means here? 0 *is* a valid value. Or do you mean a
non-zero value? Also, this now creates a dependency between GICR_TYPER
and MPIDR_EL1. How is userspace supposed to order those when restoring
a VM?

> Additionally, extend reset_mpidr and
> vgic_mmio_{read,write}_irouter to fully support Aff3. With theses
> changes, GICR_TYPER can then safely return all four affinity levels.
> 
> Suggested-by: Saurav Sachidanand <sauravsc@...zon.com>
> Signed-off-by: Wei-Lin Chang <r09922117@...e.ntu.edu.tw>
> ---
>  arch/arm64/kvm/sys_regs.c          | 24 +++++++++++++++++++++---
>  arch/arm64/kvm/vgic/vgic-debug.c   |  2 +-
>  arch/arm64/kvm/vgic/vgic-mmio-v3.c | 18 +++++++++++-------
>  include/kvm/arm_vgic.h             |  7 ++++++-
>  4 files changed, 39 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 30253bd199..6694ce851a 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -239,6 +239,19 @@ static u8 get_min_cache_line_size(bool icache)
>  	return field + 2;
>  }
>  
> +static int set_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
> +		   u64 val)
> +{
> +	bool aff3_valid = !vcpu_el1_is_32bit(vcpu) && kvm_vgic_has_a3v();

What does this mean for a guest that doesn't have a GICv3?

> +
> +	if (!aff3_valid)
> +		val &= ~((u64)MPIDR_LEVEL_MASK << MPIDR_LEVEL_SHIFT(3));
> +
> +	__vcpu_sys_reg(vcpu, rd->reg) = val;
> +
> +	return 0;
> +}
> +
>  /* Which cache CCSIDR represents depends on CSSELR value. */
>  static u32 get_ccsidr(struct kvm_vcpu *vcpu, u32 csselr)
>  {
> @@ -817,10 +830,12 @@ static u64 reset_actlr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
>  static u64 reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
>  {
>  	u64 mpidr;
> +	bool aff3_valid = !vcpu_el1_is_32bit(vcpu) && kvm_vgic_has_a3v();

Same thing.

>
>  	/*
> -	 * Map the vcpu_id into the first three affinity level fields of
> -	 * the MPIDR. We limit the number of VCPUs in level 0 due to a
> +	 * Map the vcpu_id into the affinity level fields of the MPIDR. The
> +	 * fourth level is mapped only if we are running a 64 bit guest and
> +	 * A3V is supported. We limit the number of VCPUs in level 0 due to a
>  	 * limitation to 16 CPUs in that level in the ICC_SGIxR registers
>  	 * of the GICv3 to be able to address each CPU directly when
>  	 * sending IPIs.
> @@ -828,6 +843,8 @@ static u64 reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
>  	mpidr = (vcpu->vcpu_id & 0x0f) << MPIDR_LEVEL_SHIFT(0);
>  	mpidr |= ((vcpu->vcpu_id >> 4) & 0xff) << MPIDR_LEVEL_SHIFT(1);
>  	mpidr |= ((vcpu->vcpu_id >> 12) & 0xff) << MPIDR_LEVEL_SHIFT(2);
> +	if (aff3_valid)
> +		mpidr |= (u64)((vcpu->vcpu_id >> 20) & 0xff) << MPIDR_LEVEL_SHIFT(3);

From virt/kcvm/kvm_main.c:

static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
{
	int r;
	struct kvm_vcpu *vcpu;
	struct page *page;

	if (id >= KVM_MAX_VCPU_IDS)
		return -EINVAL;

        [...]
}

So vcpu_id is capped at KVM_MAX_VCPU_IDS, which is 512 on arm64. How
does this ever produce anything other than 0? This is, by the way,
already true for Aff2. Which is why I have always found this change
extremely questionable: why do you need to describe 2^32 CPUs when you
can only create 512?

>  	mpidr |= (1ULL << 31);
>  	vcpu_write_sys_reg(vcpu, mpidr, MPIDR_EL1);
>  
> @@ -2232,7 +2249,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
>  
>  	{ SYS_DESC(SYS_DBGVCR32_EL2), trap_undef, reset_val, DBGVCR32_EL2, 0 },
>  
> -	{ SYS_DESC(SYS_MPIDR_EL1), NULL, reset_mpidr, MPIDR_EL1 },
> +	{ SYS_DESC(SYS_MPIDR_EL1), NULL, reset_mpidr, MPIDR_EL1,
> +	  .get_user = NULL, .set_user = set_mpidr },
>  
>  	/*
>  	 * ID regs: all ID_SANITISED() entries here must have corresponding
> diff --git a/arch/arm64/kvm/vgic/vgic-debug.c b/arch/arm64/kvm/vgic/vgic-debug.c
> index 85606a531d..726cf1bd7b 100644
> --- a/arch/arm64/kvm/vgic/vgic-debug.c
> +++ b/arch/arm64/kvm/vgic/vgic-debug.c
> @@ -206,7 +206,7 @@ static void print_irq_state(struct seq_file *s, struct vgic_irq *irq,
>  		      "    %2d "
>  		      "%d%d%d%d%d%d%d "
>  		      "%8d "
> -		      "%8x "
> +		      "%8llx "
>  		      " %2x "
>  		      "%3d "
>  		      "     %2d "
> diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> index c15ee1df03..ea0d4ad85a 100644
> --- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> +++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> @@ -195,13 +195,13 @@ static unsigned long vgic_mmio_read_irouter(struct kvm_vcpu *vcpu,
>  {
>  	int intid = VGIC_ADDR_TO_INTID(addr, 64);
>  	struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, NULL, intid);
> +	bool aff3_valid = !vcpu_el1_is_32bit(vcpu) && kvm_vgic_has_a3v();

Hint: if you need to write the same expression more than once, you
probably need a helper for it. Meaning that you will only have to fix
it once.

>  	unsigned long ret = 0;
>  
>  	if (!irq)
>  		return 0;
>  
> -	/* The upper word is RAZ for us. */
> -	if (!(addr & 4))
> +	if (aff3_valid || !(addr & 4))
>  		ret = extract_bytes(READ_ONCE(irq->mpidr), addr & 7, len);
>
>  	vgic_put_irq(vcpu->kvm, irq);
> @@ -213,11 +213,12 @@ static void vgic_mmio_write_irouter(struct kvm_vcpu *vcpu,
>  				    unsigned long val)
>  {
>  	int intid = VGIC_ADDR_TO_INTID(addr, 64);
> +	bool aff3_valid = !vcpu_el1_is_32bit(vcpu) && kvm_vgic_has_a3v();
>  	struct vgic_irq *irq;
>  	unsigned long flags;
>  
> -	/* The upper word is WI for us since we don't implement Aff3. */
> -	if (addr & 4)
> +	/* The upper word is WI if Aff3 is not valid. */
> +	if (!aff3_valid && addr & 4)
>  		return;
>  
>  	irq = vgic_get_irq(vcpu->kvm, NULL, intid);
> @@ -227,8 +228,7 @@ static void vgic_mmio_write_irouter(struct kvm_vcpu *vcpu,
>  
>  	raw_spin_lock_irqsave(&irq->irq_lock, flags);
>  
> -	/* We only care about and preserve Aff0, Aff1 and Aff2. */
> -	irq->mpidr = val & GENMASK(23, 0);
> +	irq->mpidr = val & MPIDR_HWID_BITMASK;
>  	irq->target_vcpu = kvm_mpidr_to_vcpu(vcpu->kvm, irq->mpidr);
>  
>  	raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
> @@ -323,7 +323,11 @@ static unsigned long vgic_mmio_read_v3r_typer(struct kvm_vcpu *vcpu,
>  	int target_vcpu_id = vcpu->vcpu_id;
>  	u64 value;
>  
> -	value = (u64)(mpidr & GENMASK(23, 0)) << 32;
> +	value = MPIDR_AFFINITY_LEVEL(mpidr, 3) << 56 |
> +		MPIDR_AFFINITY_LEVEL(mpidr, 2) << 48 |
> +		MPIDR_AFFINITY_LEVEL(mpidr, 1) << 40 |
> +		MPIDR_AFFINITY_LEVEL(mpidr, 0) << 32;

Maybe it is time to describe these shifts in an include file, and use
FIELD_PREP() to construct the whole thing. It will be a lot more
readable.

> +
>  	value |= ((target_vcpu_id & 0xffff) << 8);
>  
>  	if (vgic_has_its(vcpu->kvm))
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index 8cc38e836f..b464ac1b79 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -143,7 +143,7 @@ struct vgic_irq {
>  	unsigned int host_irq;		/* linux irq corresponding to hwintid */
>  	union {
>  		u8 targets;			/* GICv2 target VCPUs mask */
> -		u32 mpidr;			/* GICv3 target VCPU */
> +		u64 mpidr;			/* GICv3 target VCPU */

Do we really need to grow each interrupt object by 4 bytes, specially
when we at most use 4 bytes? I'd rather we store the affinity in a
compact way and change the way we compare it to the vcpu's MPIDR_EL1.

>  	};
>  	u8 source;			/* GICv2 SGIs only */
>  	u8 active_source;		/* GICv2 SGIs only */
> @@ -413,6 +413,11 @@ static inline int kvm_vgic_get_max_vcpus(void)
>  	return kvm_vgic_global_state.max_gic_vcpus;
>  }
>  
> +static inline bool kvm_vgic_has_a3v(void)
> +{
> +	return kvm_vgic_global_state.ich_vtr_el2 & ICH_VTR_A3V_MASK;
> +}
> +

I can see multiple problems with this:

- this is the host state, which shouldn't necessarily represent the
  guest state. It should be possible to restore a VM that have a
  different A3V value and still have the same guarantees.  There is
  however a small nit around ICV_CTLR_EL1.A3V, which would require
  trapping to emulate the A3V bit.

- this assumes GICv3, which is definitely not universal (we support
  GICv2, for which no such restriction actually exists).

Finally, I don't see VM save/restore being addressed here, and I
suspect it hasn't been looked at.

Overall, this patch does too many things, and it should be split in
discrete changes. I also want to see an actual justification for Aff3
support. And if we introduce it, it must be fully virtualised
(independent of the A3V support on the host).

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.