[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZeDPgx1O_AuR2Iz3@google.com>
Date: Thu, 29 Feb 2024 10:40:03 -0800
From: Sean Christopherson <seanjc@...gle.com>
To: Paolo Bonzini <pbonzini@...hat.com>
Cc: kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
Yan Zhao <yan.y.zhao@...el.com>, Isaku Yamahata <isaku.yamahata@...el.com>,
Michael Roth <michael.roth@....com>, Yu Zhang <yu.c.zhang@...ux.intel.com>,
Chao Peng <chao.p.peng@...ux.intel.com>, Fuad Tabba <tabba@...gle.com>,
David Matlack <dmatlack@...gle.com>
Subject: Re: [PATCH 02/16] KVM: x86: Remove separate "bit" defines for page
fault error code masks
On Thu, Feb 29, 2024, Paolo Bonzini wrote:
> On Wed, Feb 28, 2024 at 3:46 AM Sean Christopherson <seanjc@...gle.com> wrote:
> > diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> > index 60f21bb4c27b..e8b620a85627 100644
> > --- a/arch/x86/kvm/mmu.h
> > +++ b/arch/x86/kvm/mmu.h
> > @@ -213,7 +213,7 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
> > */
> > u64 implicit_access = access & PFERR_IMPLICIT_ACCESS;
> > bool not_smap = ((rflags & X86_EFLAGS_AC) | implicit_access) == X86_EFLAGS_AC;
> > - int index = (pfec + (not_smap << PFERR_RSVD_BIT)) >> 1;
> > + int index = (pfec + (not_smap << ilog2(PFERR_RSVD_MASK))) >> 1;
>
> Just use "(pfec + (not_smap ? PFERR_RSVD_MASK : 0)) >> 1".
>
> Likewise below, "pte_access & PT_USER_MASK ? PFERR_RSVD_MASK : 0"/
>
> No need to even check what the compiler produces, it will be either
> exactly the same code or a bunch of cmov instructions.
I couldn't resist :-)
The second one generates identical code, but for this one:
int index = (pfec + (not_smap << PFERR_RSVD_BIT)) >> 1;
gcc generates almost bizarrely different code in the call from vcpu_mmio_gva_to_gpa().
clang is clever enough to realize "pfec" can only contain USER_MASK and/or WRITE_MASK,
and so does a ton of dead code elimination and other optimizations. But for some
reason, gcc doesn't appear to realize that, and generates a MOVSX when computing
"index", i.e. sign-extends the result of the ADD (at least, I think that's what it's
doing).
There's no actual bug today, and the vcpu_mmio_gva_to_gpa() path is super safe
since KVM fully controls the error code. But the call from FNAME(walk_addr_generic)
uses a _much_ more dynamic error code.
If an error code with unexpected bits set managed to get into permission_fault(),
I'm pretty sure we'd end up with out-of-bounds accesses. KVM sanity checks that
PK and RSVD aren't set,
WARN_ON(pfec & (PFERR_PK_MASK | PFERR_RSVD_MASK));
but KVM unnecessarily uses an ADD instead of OR, here
int index = (pfec + (not_smap << PFERR_RSVD_BIT)) >> 1;
and here
/* clear present bit, replace PFEC.RSVD with ACC_USER_MASK. */
offset = (pfec & ~1) +
((pte_access & PT_USER_MASK) << (PFERR_RSVD_BIT - PT_USER_SHIFT));
i.e. if the WARN fired, KVM would generate completely unexpected values due to
adding two RSVD bit flags.
And if _really_ unexpected flags make their way into permission_fault(), e.g. the
upcoming RMP flag (bit 31) or Intel's SGX flag (bit 15), then the use of index
fault = (mmu->permissions[index] >> pte_access) & 1;
could generate a read waaaya outside of the array. It can't/shouldn't happen in
practice since KVM shouldn't be trying to emulate RMP violations or faults in SGX
enclaves, but it's unnecessarily dangerous.
Long story short, I think we should get to the below (I'll post a separate series,
assuming I'm not missing something).
unsigned long rflags = static_call(kvm_x86_get_rflags)(vcpu);
unsigned int pfec = access & (PFERR_PRESENT_MASK |
PFERR_WRITE_MASK |
PFERR_USER_MASK |
PFERR_FETCH_MASK);
/*
* For explicit supervisor accesses, SMAP is disabled if EFLAGS.AC = 1.
* For implicit supervisor accesses, SMAP cannot be overridden.
*
* SMAP works on supervisor accesses only, and not_smap can
* be set or not set when user access with neither has any bearing
* on the result.
*
* We put the SMAP checking bit in place of the PFERR_RSVD_MASK bit;
* this bit will always be zero in pfec, but it will be one in index
* if SMAP checks are being disabled.
*/
u64 implicit_access = access & PFERR_IMPLICIT_ACCESS;
bool not_smap = ((rflags & X86_EFLAGS_AC) | implicit_access) == X86_EFLAGS_AC;
int index = (pfec | (not_smap ? PFERR_RSVD_MASK : 0)) >> 1;
u32 errcode = PFERR_PRESENT_MASK;
bool fault;
kvm_mmu_refresh_passthrough_bits(vcpu, mmu);
fault = (mmu->permissions[index] >> pte_access) & 1;
/*
* Sanity check that no bits are set in the legacy #PF error code
* (bits 31:0) other than the supported permission bits (see above).
*/
WARN_ON_ONCE(pfec != (unsigned int)access);
if (unlikely(mmu->pkru_mask)) {
u32 pkru_bits, offset;
/*
* PKRU defines 32 bits, there are 16 domains and 2
* attribute bits per domain in pkru. pte_pkey is the
* index of the protection domain, so pte_pkey * 2 is
* is the index of the first bit for the domain.
*/
pkru_bits = (vcpu->arch.pkru >> (pte_pkey * 2)) & 3;
/* clear present bit, replace PFEC.RSVD with ACC_USER_MASK. */
offset = (pfec & ~1) | (pte_access & PT_USER_MASK ? PFERR_RSVD_MASK : 0);
pkru_bits &= mmu->pkru_mask >> offset;
errcode |= -pkru_bits & PFERR_PK_MASK;
fault |= (pkru_bits != 0);
}
return -(u32)fault & errcode;
Powered by blists - more mailing lists