lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 21 Jun 2024 12:24:34 +0100
From: Marc Zyngier <maz@...nel.org>
To: Anshuman Khandual <anshuman.khandual@....com>
Cc: linux-arm-kernel@...ts.infradead.org,
	Oliver Upton <oliver.upton@...ux.dev>,
	James Morse <james.morse@....com>,
	Suzuki K Poulose <suzuki.poulose@....com>,
	Catalin Marinas <catalin.marinas@....com>,
	Will Deacon <will@...nel.org>,
	Mark Brown <broonie@...nel.org>,
	kvmarm@...ts.linux.dev,
	linux-kernel@...r.kernel.org
Subject: Re: [RFC 10/10] KVM: arm64: nv: Add new HDFGRTR2_GROUP & HDFGRTR2_GROUP based FGU handling

On Fri, 21 Jun 2024 08:56:15 +0100,
Anshuman Khandual <anshuman.khandual@....com> wrote:
> 
> On 6/20/24 16:36, Marc Zyngier wrote:
> > On Thu, 20 Jun 2024 07:58:07 +0100,
> > Anshuman Khandual <anshuman.khandual@....com> wrote:
> >>
> >> This adds new HDFGRTR2_GROUP and HDFGWTR2_GROUP groups in enum fgt_group_id
> >> for FGU handling managed with HDFGRTR2_EL2 and HDFGWTR2_EL2 registers.
> >>
> >> Cc: Marc Zyngier <maz@...nel.org>
> >> Cc: Oliver Upton <oliver.upton@...ux.dev>
> >> Cc: James Morse <james.morse@....com>
> >> Cc: Suzuki K Poulose <suzuki.poulose@....com>
> >> Cc: linux-arm-kernel@...ts.infradead.org
> >> Cc: kvmarm@...ts.linux.dev
> >> Cc: linux-kernel@...r.kernel.org
> >> Signed-off-by: Anshuman Khandual <anshuman.khandual@....com>
> >> ---
> >>  arch/arm64/include/asm/kvm_arm.h        |  8 +++++
> >>  arch/arm64/include/asm/kvm_host.h       |  2 ++
> >>  arch/arm64/kvm/emulate-nested.c         | 14 ++++++++
> >>  arch/arm64/kvm/hyp/include/hyp/switch.h | 10 ++++++
> >>  arch/arm64/kvm/nested.c                 | 36 ++++++++++++++++++++
> >>  arch/arm64/kvm/sys_regs.c               | 45 +++++++++++++++++++++++++
> >>  6 files changed, 115 insertions(+)
> >>
> >> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> >> index b2adc2c6c82a..b3fb368bcadb 100644
> >> --- a/arch/arm64/include/asm/kvm_arm.h
> >> +++ b/arch/arm64/include/asm/kvm_arm.h
> >> @@ -354,6 +354,14 @@
> >>  #define __HFGRTR_EL2_MASK	GENMASK(49, 0)
> >>  #define __HFGRTR_EL2_nMASK	~(__HFGRTR_EL2_RES0 | __HFGRTR_EL2_MASK)
> >>  
> >> +#define __HDFGRTR2_EL2_RES0	HDFGRTR2_EL2_RES0
> >> +#define __HDFGRTR2_EL2_MASK	(GENMASK(22, 22) | GENMASK(20, 0))
> > 
> > How about bit 23? How comes you are considering all these bits as positive?
> 
> It had 23 earlier looking into DDI0601 2024-03 but then referred into ARM
> ARM DDI 0487K.A which changed it as 22 - which was wrong, will change this
> again if required.

I guess we're past that point now, and we should delete the comment
that limits it to K.a. Anything that is published as part of the XML
drop (DDI 0602) is now fair game.

> 
> > 
> >> +#define __HDFGRTR2_EL2_nMASK	~(__HDFGRTR2_EL2_RES0 | __HDFGRTR2_EL2_MASK)
> > 
> > Note the *nMASK* here. 'n' is for *negative*. Just look at how
> > __HDFGRTR_EL2_MASK and __HDFGRTR_EL2_nMASK are written.
> 
> Still trying to understand what does these three masks represent
> for a given FGT register XXXX
> 
> 	- __XXXX_RES0

RES0 bits.

> 	- __XXXX_MASK

Positive trap bits.

> 	- __XXXX_nMASK

Negative trap bits.

> 
> But from the mentioned example for HDFGRTR_EL2.
> 
> #define __HDFGRTR_EL2_RES0      HDFGRTR_EL2_RES0
> #define __HDFGRTR_EL2_MASK      (BIT(63) | GENMASK(58, 50) | GENMASK(48, 43) | \
>                                  GENMASK(41, 40) | GENMASK(37, 22) | \
>                                  GENMASK(19, 9) | GENMASK(7, 0))
> #define __HDFGRTR_EL2_nMASK     ~(__HDFGRTR_EL2_RES0 | __HDFGRTR_EL2_MASK)
> 
> Looking at HDFGRTR_EL2 definition inside arch/arm64/tools/sysreg
> 
> HDFGRTR_EL2_RES0 = BIT(49) | GENMASK(39, 38) | GENMASK(21, 20) | BIT(8)
> 
> is representing the entire mask in the register which are RES0. But then what
> does __HDFGRTR_EL2_MASK signify ? Positive trap bits mask ?
> 
> The following bits belong in __HDFGRTR_EL2_MASK
> 
> HDFGRTR_EL2.PMBIDR_EL1    (63)
> HDFGRTR_EL2.PMCEIDn_EL0   (58)
> 
> Where as the following bits belong in __HDFGRTR_EL2_nMASK
> 
> HDFGRTR_EL2.nPMSNEVFR_EL1 (61)
> HDFGRTR_EL2.nBRBCTL	  (60) 
> 
> Reworking proposed HDFGRTR2_EL2 and HDFGWTR2_EL2. 
> 
> #define __HDFGRTR2_EL2_RES0     HDFGRTR2_EL2_RES0
> #define __HDFGRTR2_EL2_MASK     0
> #define __HDFGRTR2_EL2_nMASK    ~(__HDFGRTR2_EL2_RES0 | __HDFGRTR2_EL2_MASK)
> 
> #define __HDFGWTR2_EL2_RES0     HDFGWTR2_EL2_RES0
> #define __HDFGWTR2_EL2_MASK     0
> #define __HDFGWTR2_EL2_nMASK    ~(__HDFGWTR2_EL2_RES0 | __HDFGWTR2_EL2_MASK)
> 
> Please note that all trap bits in both these registers are negative ones
> hence __HDFGRTR2_EL2_MASK/__HDFGWTR2_EL2_MASK should be 0.

That's because you're looking at the XML, and not the ARM ARM this was
written against.

> 
> > 
> >> +
> >> +#define __HDFGWTR2_EL2_RES0	HDFGWTR2_EL2_RES0
> >> +#define __HDFGWTR2_EL2_MASK	(GENMASK(22, 19) | GENMASK(16, 7) | GENMASK(5, 0))
> > 
> > Again, how about bit 23? Same remark for the polarity.
> > 
> >> +#define __HDFGWTR2_EL2_nMASK	~(__HDFGWTR2_EL2_RES0 | __HDFGWTR2_EL2_MASK)
> >> +
> >>  /*
> >>   * The HFGWTR bits are a subset of HFGRTR bits. To ensure we don't miss any
> >>   * future additions, define __HFGWTR* macros relative to __HFGRTR* ones.
> >> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> >> index 7b44e96e7270..d6fbd6ebc32d 100644
> >> --- a/arch/arm64/include/asm/kvm_host.h
> >> +++ b/arch/arm64/include/asm/kvm_host.h
> >> @@ -239,6 +239,8 @@ enum fgt_group_id {
> >>  	HDFGWTR_GROUP = HDFGRTR_GROUP,
> >>  	HFGITR_GROUP,
> >>  	HAFGRTR_GROUP,
> >> +	HDFGRTR2_GROUP,
> >> +	HDFGWTR2_GROUP = HDFGRTR2_GROUP,
> >>  
> >>  	/* Must be last */
> >>  	__NR_FGT_GROUP_IDS__
> >> diff --git a/arch/arm64/kvm/emulate-nested.c b/arch/arm64/kvm/emulate-nested.c
> >> index 54090967a335..bc5ea1e60a0a 100644
> >> --- a/arch/arm64/kvm/emulate-nested.c
> >> +++ b/arch/arm64/kvm/emulate-nested.c
> >> @@ -1724,6 +1724,9 @@ static const struct encoding_to_trap_config encoding_to_fgt[] __initconst = {
> >>  	SR_FGT(SYS_AMEVCNTR0_EL0(2),	HAFGRTR, AMEVCNTR02_EL0, 1),
> >>  	SR_FGT(SYS_AMEVCNTR0_EL0(1),	HAFGRTR, AMEVCNTR01_EL0, 1),
> >>  	SR_FGT(SYS_AMEVCNTR0_EL0(0),	HAFGRTR, AMEVCNTR00_EL0, 1),
> >> +
> >> +	/* HDFGRTR2_EL2 */
> >> +	SR_FGT(SYS_MDSELR_EL1,		HDFGRTR2, nMDSELR_EL1, 1),
> > 
> > No. See the 'n' prefix on this bit?
> 
> Right, that should be a 0 instead.
> 
> > 
> > Also, where are all the other bits for the HDFRxTR2 registers?
> 
> This will require a number of new registers being added into tools sysreg
> expanding the series further, but will go ahead add all that is required.

Please do.

> Although I had asked about this in the cover letter.
> 
> - Probably an entry is needed for SYS_MDSELR_EL1 in encoding_to_fgt[] table
>   inside the file arch/arm64/kvm/emulate-nested.c, but while trying to test
>   features for all individual bits in HDFGRTR2_EL2, it seemed a lot of new
>   register definitions from various features need to be added as well, thus
>   expanding the scope further. Should all required new system registers be
>   added for completeness ?

Anything on which you expect KVM to interact with *must* be fully
described.  I don't want to have to second guess the exception routing
tables when we add support for a feature.

In short, this is the end of half baked feature support in KVM. When
you add support for a trap register, it comes with everything it
traps, recursively.

[...]

> >> diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
> >> index bae8536cbf00..ebe4e3972fed 100644
> >> --- a/arch/arm64/kvm/nested.c
> >> +++ b/arch/arm64/kvm/nested.c
> >> @@ -384,6 +384,42 @@ int kvm_init_nv_sysregs(struct kvm *kvm)
> >>  		res0 |= HDFGRTR_EL2_nPMSNEVFR_EL1;
> >>  	set_sysreg_masks(kvm, HDFGRTR_EL2, res0 | HDFGRTR_EL2_RES0, res1);
> >>  
> >> +	/* HDFG[RW]TR2_EL2 */
> >> +	res0 = res1 = 0;
> >> +
> >> +	/* FEAT_TRBE_MPAM is not exposed to the guest */
> >> +	res0 |= HDFGRTR2_EL2_nTRBMPAM_EL1;
> > 
> > No. We are moving away from hard-coded features, so you have to
> > explicitly check it.
> 
> The above code was just temporary for this RFC. But you are right, these
> features need to be checked properly but there is a challenge. As I have
> mentioned in the cover letter

You are wasting everybody's time with these RFCs. Either you *try* do
the right thing from the first post, or you don't bother posting the
patches. What's the point of posting them if you know this isn't
right?

Writing these reviews take time and energy, and there is no shortage
of this I'd rather do instead of having to write these emails. If you
have questions, just ask. You don't need to dump 10 patches on the
list for that.

> 
> - TRBIDR_EL1.MPAM needs to be probed for setting HDFGRTR2_EL2_nTRBMPAM_EL1
>   but kvm_has_feat() does not operate a non-ID register which causes build
>   warnings. The same problem exists for probing PMSIDR_EL1.FDS which is
>   needed for setting HDFGRTR2_EL2_nPMSDSFR_EL1 as well. Currently both the
>   bits mentioned earlier are set, assuming the features are not present in
>   nested virtualization. Do we need some new helpers to probe these non-ID
>   registers as well ?
> 
> How do you suggest we proceed on this - testing features in TRBIDR_EL1 and
> PMSIDR_EL1 ?

Let's look at the spec.

For TRBMPAM_EL1 being implemented (from K.a):

* This register is present only when FEAT_TRBE_MPAM is
  implemented. Otherwise, direct accesses to TRBMPAM_EL1 are
  UNDEFINED.

* If FEAT_TRBE_MPAM is implemented, then FEAT_TRBE_EXT is implemented.

* The following fields identify the presence of FEAT_TRBE_EXT:
  * ID_AA64DFR0_EL1.ExtTrcBuff.

This allows you do shortcut it one level above the particular MPAM
requirement, which is good enough (I don't expect external traces to
be exposed to a VM). Therefore no need to look at TRBIDR_EL1.

For PMSDSFR_EL1 being implemented (from K.a):

* This register is present only when FEAT_SPE_FDS is
  implemented. Otherwise, direct accesses to PMSDSFR_EL1 are
  UNDEFINED.

* If FEAT_SPE_FDS is implemented, then FEAT_SPEv1p4 is implemented.

* The following field identifies the presence of FEAT_SPEv1p4:
  * ID_AA64DFR0_EL1.PMSVer.

So again that's your cut-off point. Until we support this level of SPE
in KVM, we're pretty safe (and I seriously doubt we'll get there in my
lifetime, given the current pace of progress).

If at some point we need to support ID registers that are not in the
feature range, we'll do that (Oliver has some stuff already for
CTR_EL0).  But don't hardcode things.

[...]

> >> +	set_sysreg_masks(kvm, HDFGRTR2_EL2, res0 | HDFGRTR2_EL2_RES0, res1);
> >> +	set_sysreg_masks(kvm, HDFGWTR2_EL2, res0 | HDFGWTR2_EL2_RES0, res1);
> > 
> > How about the HDFGxTR2_EL2_RES1 bits?
> 
> I assume you are suggesting something like this.
> 
> -       set_sysreg_masks(kvm, HDFGRTR2_EL2, res0 | HDFGRTR2_EL2_RES0, res1);
> -       set_sysreg_masks(kvm, HDFGWTR2_EL2, res0 | HDFGWTR2_EL2_RES0, res1);
> +       set_sysreg_masks(kvm, HDFGRTR2_EL2, res0 | HDFGRTR2_EL2_RES0, res1 | HDFGRTR2_EL2_RES1);
> +       set_sysreg_masks(kvm, HDFGWTR2_EL2, res0 | HDFGWTR2_EL2_RES0, res1 | HDFGWTR2_EL2_RES1);
> 
> But guess both HDFGRTR2_EL2_RES1 and HDFGWTR2_EL2_RES1 will be empty (0) as there
> are no res1 bit positions in either of these registers. But will change as above.

I don't care about these values being 0. I want the reassuring feeling
that we're not missing anything, because debugging this is hell if you
have a guest that sets/clears random bits.

[...]

> >> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> >> index f921af014d0c..8029f408855d 100644
> >> --- a/arch/arm64/kvm/sys_regs.c
> >> +++ b/arch/arm64/kvm/sys_regs.c
> >> @@ -4110,6 +4110,51 @@ void kvm_init_sysreg(struct kvm_vcpu *vcpu)
> >>  		kvm->arch.fgu[HAFGRTR_GROUP] |= ~(HAFGRTR_EL2_RES0 |
> >>  						  HAFGRTR_EL2_RES1);
> >>  
> >> +	/* FEAT_TRBE_MPAM is not exposed to the guest */
> >> +	kvm->arch.fgu[HDFGRTR2_GROUP] |= ~(HDFGRTR2_EL2_nTRBMPAM_EL1);
> > 
> > Same thing about dynamic configuration.
> > 
> > But more importantly, you are disabling anything *but* MPAM.  Does it
> > seem right to you?
> 
> Did not get that, should the inverse ~ be dropped here and also for all
> other negative trap bits across the register ?

Look at the way FGU works. A bit set to 1 means that if we have
trapped because of this bit (as per the FGT table), we inject an
UNDEF.

> 
> > 
> >> +
> >> +	/* FEAT_SPE_FDS is not exposed to the guest */
> >> +	kvm->arch.fgu[HDFGRTR2_GROUP] |= ~(HDFGRTR2_EL2_nPMSDSFR_EL1);
> > 
> > Same thing.
> 
> As mentioned earlier there is a challenge in checking for the features
> via non-ID registers i.e TRBIDR_EL1.MPAM and PMSIDR_EL1.FDS.

As I wrote, there is no problem. You always get enough ID-reg
information to do something sensible. And if we need to eventually add
the infrastructure for that, so be it.

	M.

-- 
Without deviation from the norm, progress is not possible.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ