linux-kernel - Re: [PATCH 1/1] KVM: arm64: Affinity level 3 support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20240227022708.795214-1-r09922117@csie.ntu.edu.tw>
Date: Tue, 27 Feb 2024 10:27:08 +0800
From: Wei-Lin Chang <r09922117@...e.ntu.edu.tw>
To: maz@...nel.org
Cc: oliver.upton@...ux.dev,
	james.morse@....com,
	suzuki.poulose@....com,
	yuzenghui@...wei.com,
	catalin.marinas@....com,
	will@...nel.org,
	linux-arm-kernel@...ts.infradead.org,
	kvmarm@...ts.linux.dev,
	linux-kernel@...r.kernel.org,
	sauravsc@...zon.com,
	eric.auger@...hat.com
Subject: Re: [PATCH 1/1] KVM: arm64: Affinity level 3 support

On Sun, 25 Feb 2024 11:19:05 +0000,
Marc Zyngier <maz@...nel.org> wrote:
>
> [+Eric who was looking into something related recently]
> 
> Hi Wei-Lin,
> 
> Thanks for looking into this.
> 
> On Sun, 25 Feb 2024 09:02:37 +0000,
> Wei-Lin Chang <r09922117@...e.ntu.edu.tw> wrote:
> > 
> > Currently, KVM ARM64 avoids using the Aff3 field for VCPUs, which saves
> > us from having to check for hardware support in ICH_VTR_EL2.A3V or the
> 
> That's not strictly true. We do check for A3V support at restore time.
> 
> > guest's execution state. However a VCPU could still have its Aff3 bits
> > set to non-zero if the VMM directly changes the VCPU's MPIDR_EL1. This
> > causes a mismatch between MPIDR_EL1.Aff3 and GICR_TYPER[63:56] since 0s
> > are always returned for the latter, failing the GIC Redistributor
> > matching in the VM.
> > 
> > Let's fix this by only allowing userspace to write into the Aff3 field
> > of MPIDR_EL1 if Aff3 is valid.
> 
> What does "valid" means here? 0 *is* a valid value. Or do you mean a
> non-zero value? Also, this now creates a dependency between GICR_TYPER
> and MPIDR_EL1. How is userspace supposed to order those when restoring
> a VM?
> 
> > Additionally, extend reset_mpidr and
> > vgic_mmio_{read,write}_irouter to fully support Aff3. With theses
> > changes, GICR_TYPER can then safely return all four affinity levels.
> > 
> > Suggested-by: Saurav Sachidanand <sauravsc@...zon.com>
> > Signed-off-by: Wei-Lin Chang <r09922117@...e.ntu.edu.tw>
> > ---
> >  arch/arm64/kvm/sys_regs.c          | 24 +++++++++++++++++++++---
> >  arch/arm64/kvm/vgic/vgic-debug.c   |  2 +-
> >  arch/arm64/kvm/vgic/vgic-mmio-v3.c | 18 +++++++++++-------
> >  include/kvm/arm_vgic.h             |  7 ++++++-
> >  4 files changed, 39 insertions(+), 12 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> > index 30253bd199..6694ce851a 100644
> > --- a/arch/arm64/kvm/sys_regs.c
> > +++ b/arch/arm64/kvm/sys_regs.c
> > @@ -239,6 +239,19 @@ static u8 get_min_cache_line_size(bool icache)
> >  	return field + 2;
> >  }
> >  
> > +static int set_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd,
> > +		   u64 val)
> > +{
> > +	bool aff3_valid = !vcpu_el1_is_32bit(vcpu) && kvm_vgic_has_a3v();
> 
> What does this mean for a guest that doesn't have a GICv3?
> 
> > +
> > +	if (!aff3_valid)
> > +		val &= ~((u64)MPIDR_LEVEL_MASK << MPIDR_LEVEL_SHIFT(3));
> > +
> > +	__vcpu_sys_reg(vcpu, rd->reg) = val;
> > +
> > +	return 0;
> > +}
> > +
> >  /* Which cache CCSIDR represents depends on CSSELR value. */
> >  static u32 get_ccsidr(struct kvm_vcpu *vcpu, u32 csselr)
> >  {
> > @@ -817,10 +830,12 @@ static u64 reset_actlr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
> >  static u64 reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
> >  {
> >  	u64 mpidr;
> > +	bool aff3_valid = !vcpu_el1_is_32bit(vcpu) && kvm_vgic_has_a3v();
> 
> Same thing.
> 
> >
> >  	/*
> > -	 * Map the vcpu_id into the first three affinity level fields of
> > -	 * the MPIDR. We limit the number of VCPUs in level 0 due to a
> > +	 * Map the vcpu_id into the affinity level fields of the MPIDR. The
> > +	 * fourth level is mapped only if we are running a 64 bit guest and
> > +	 * A3V is supported. We limit the number of VCPUs in level 0 due to a
> >  	 * limitation to 16 CPUs in that level in the ICC_SGIxR registers
> >  	 * of the GICv3 to be able to address each CPU directly when
> >  	 * sending IPIs.
> > @@ -828,6 +843,8 @@ static u64 reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
> >  	mpidr = (vcpu->vcpu_id & 0x0f) << MPIDR_LEVEL_SHIFT(0);
> >  	mpidr |= ((vcpu->vcpu_id >> 4) & 0xff) << MPIDR_LEVEL_SHIFT(1);
> >  	mpidr |= ((vcpu->vcpu_id >> 12) & 0xff) << MPIDR_LEVEL_SHIFT(2);
> > +	if (aff3_valid)
> > +		mpidr |= (u64)((vcpu->vcpu_id >> 20) & 0xff) << MPIDR_LEVEL_SHIFT(3);
> 
> From virt/kcvm/kvm_main.c:
> 
> static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
> {
> 	int r;
> 	struct kvm_vcpu *vcpu;
> 	struct page *page;
> 
> 	if (id >= KVM_MAX_VCPU_IDS)
> 		return -EINVAL;
> 
>         [...]
> }
> 
> So vcpu_id is capped at KVM_MAX_VCPU_IDS, which is 512 on arm64. How
> does this ever produce anything other than 0? This is, by the way,
> already true for Aff2. Which is why I have always found this change
> extremely questionable: why do you need to describe 2^32 CPUs when you
> can only create 512?
> 
> >  	mpidr |= (1ULL << 31);
> >  	vcpu_write_sys_reg(vcpu, mpidr, MPIDR_EL1);
> >  
> > @@ -2232,7 +2249,8 @@ static const struct sys_reg_desc sys_reg_descs[] = {
> >  
> >  	{ SYS_DESC(SYS_DBGVCR32_EL2), trap_undef, reset_val, DBGVCR32_EL2, 0 },
> >  
> > -	{ SYS_DESC(SYS_MPIDR_EL1), NULL, reset_mpidr, MPIDR_EL1 },
> > +	{ SYS_DESC(SYS_MPIDR_EL1), NULL, reset_mpidr, MPIDR_EL1,
> > +	  .get_user = NULL, .set_user = set_mpidr },
> >  
> >  	/*
> >  	 * ID regs: all ID_SANITISED() entries here must have corresponding
> > diff --git a/arch/arm64/kvm/vgic/vgic-debug.c b/arch/arm64/kvm/vgic/vgic-debug.c
> > index 85606a531d..726cf1bd7b 100644
> > --- a/arch/arm64/kvm/vgic/vgic-debug.c
> > +++ b/arch/arm64/kvm/vgic/vgic-debug.c
> > @@ -206,7 +206,7 @@ static void print_irq_state(struct seq_file *s, struct vgic_irq *irq,
> >  		      "    %2d "
> >  		      "%d%d%d%d%d%d%d "
> >  		      "%8d "
> > -		      "%8x "
> > +		      "%8llx "
> >  		      " %2x "
> >  		      "%3d "
> >  		      "     %2d "
> > diff --git a/arch/arm64/kvm/vgic/vgic-mmio-v3.c b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> > index c15ee1df03..ea0d4ad85a 100644
> > --- a/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> > +++ b/arch/arm64/kvm/vgic/vgic-mmio-v3.c
> > @@ -195,13 +195,13 @@ static unsigned long vgic_mmio_read_irouter(struct kvm_vcpu *vcpu,
> >  {
> >  	int intid = VGIC_ADDR_TO_INTID(addr, 64);
> >  	struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, NULL, intid);
> > +	bool aff3_valid = !vcpu_el1_is_32bit(vcpu) && kvm_vgic_has_a3v();
> 
> Hint: if you need to write the same expression more than once, you
> probably need a helper for it. Meaning that you will only have to fix
> it once.
> 
> >  	unsigned long ret = 0;
> >  
> >  	if (!irq)
> >  		return 0;
> >  
> > -	/* The upper word is RAZ for us. */
> > -	if (!(addr & 4))
> > +	if (aff3_valid || !(addr & 4))
> >  		ret = extract_bytes(READ_ONCE(irq->mpidr), addr & 7, len);
> >
> >  	vgic_put_irq(vcpu->kvm, irq);
> > @@ -213,11 +213,12 @@ static void vgic_mmio_write_irouter(struct kvm_vcpu *vcpu,
> >  				    unsigned long val)
> >  {
> >  	int intid = VGIC_ADDR_TO_INTID(addr, 64);
> > +	bool aff3_valid = !vcpu_el1_is_32bit(vcpu) && kvm_vgic_has_a3v();
> >  	struct vgic_irq *irq;
> >  	unsigned long flags;
> >  
> > -	/* The upper word is WI for us since we don't implement Aff3. */
> > -	if (addr & 4)
> > +	/* The upper word is WI if Aff3 is not valid. */
> > +	if (!aff3_valid && addr & 4)
> >  		return;
> >  
> >  	irq = vgic_get_irq(vcpu->kvm, NULL, intid);
> > @@ -227,8 +228,7 @@ static void vgic_mmio_write_irouter(struct kvm_vcpu *vcpu,
> >  
> >  	raw_spin_lock_irqsave(&irq->irq_lock, flags);
> >  
> > -	/* We only care about and preserve Aff0, Aff1 and Aff2. */
> > -	irq->mpidr = val & GENMASK(23, 0);
> > +	irq->mpidr = val & MPIDR_HWID_BITMASK;
> >  	irq->target_vcpu = kvm_mpidr_to_vcpu(vcpu->kvm, irq->mpidr);
> >  
> >  	raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
> > @@ -323,7 +323,11 @@ static unsigned long vgic_mmio_read_v3r_typer(struct kvm_vcpu *vcpu,
> >  	int target_vcpu_id = vcpu->vcpu_id;
> >  	u64 value;
> >  
> > -	value = (u64)(mpidr & GENMASK(23, 0)) << 32;
> > +	value = MPIDR_AFFINITY_LEVEL(mpidr, 3) << 56 |
> > +		MPIDR_AFFINITY_LEVEL(mpidr, 2) << 48 |
> > +		MPIDR_AFFINITY_LEVEL(mpidr, 1) << 40 |
> > +		MPIDR_AFFINITY_LEVEL(mpidr, 0) << 32;
> 
> Maybe it is time to describe these shifts in an include file, and use
> FIELD_PREP() to construct the whole thing. It will be a lot more
> readable.
> 
> > +
> >  	value |= ((target_vcpu_id & 0xffff) << 8);
> >  
> >  	if (vgic_has_its(vcpu->kvm))
> > diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> > index 8cc38e836f..b464ac1b79 100644
> > --- a/include/kvm/arm_vgic.h
> > +++ b/include/kvm/arm_vgic.h
> > @@ -143,7 +143,7 @@ struct vgic_irq {
> >  	unsigned int host_irq;		/* linux irq corresponding to hwintid */
> >  	union {
> >  		u8 targets;			/* GICv2 target VCPUs mask */
> > -		u32 mpidr;			/* GICv3 target VCPU */
> > +		u64 mpidr;			/* GICv3 target VCPU */
> 
> Do we really need to grow each interrupt object by 4 bytes, specially
> when we at most use 4 bytes? I'd rather we store the affinity in a
> compact way and change the way we compare it to the vcpu's MPIDR_EL1.
> 
> >  	};
> >  	u8 source;			/* GICv2 SGIs only */
> >  	u8 active_source;		/* GICv2 SGIs only */
> > @@ -413,6 +413,11 @@ static inline int kvm_vgic_get_max_vcpus(void)
> >  	return kvm_vgic_global_state.max_gic_vcpus;
> >  }
> >  
> > +static inline bool kvm_vgic_has_a3v(void)
> > +{
> > +	return kvm_vgic_global_state.ich_vtr_el2 & ICH_VTR_A3V_MASK;
> > +}
> > +
> 
> I can see multiple problems with this:
> 
> - this is the host state, which shouldn't necessarily represent the
>   guest state. It should be possible to restore a VM that have a
>   different A3V value and still have the same guarantees.  There is
>   however a small nit around ICV_CTLR_EL1.A3V, which would require
>   trapping to emulate the A3V bit.
> 
> - this assumes GICv3, which is definitely not universal (we support
>   GICv2, for which no such restriction actually exists).
> 
> Finally, I don't see VM save/restore being addressed here, and I
> suspect it hasn't been looked at.
> 
> Overall, this patch does too many things, and it should be split in
> discrete changes. I also want to see an actual justification for Aff3
> support. And if we introduce it, it must be fully virtualised
> (independent of the A3V support on the host).

Hi Marc,

Really appreciate for the feedback. I think I understand most of your
comments and agree with them. It appears that I don't fully understand
the changes that I am doing with this. Thanks for explaining.

Cheers,
Wei-Lin Chang