linux-kernel - Re: [RFC PATCH 1/2] KVM: MMU: precompute page fault error code

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <56E17F9A.2010308@redhat.com>
Date:	Thu, 10 Mar 2016 15:07:22 +0100
From:	Paolo Bonzini <pbonzini@...hat.com>
To:	Xiao Guangrong <guangrong.xiao@...ux.intel.com>,
	linux-kernel@...r.kernel.org, kvm@...r.kernel.org
Cc:	huaitong.han@...el.com
Subject: Re: [RFC PATCH 1/2] KVM: MMU: precompute page fault error code



On 10/03/2016 15:01, Xiao Guangrong wrote:
> 
> 
> On 03/08/2016 07:45 PM, Paolo Bonzini wrote:
>> For the next patch, we will want to filter PFERR_FETCH_MASK away early,
>> and not pass it to permission_fault if neither NX nor SMEP are enabled.
>> Prepare for the change.
> 
> Why it is needed? It is much easier to drop PFEC.F in
> update_permission_bitmask().

It's just how I structured the patches.  It's unrelated to
update_permission_bimask.

I wanted permission_fault to return a fault code, and while it's easy to
add bits (such as PKERR_PK_MASK) it's harder to remove bits from the
PFEC that permission_fault receives.  So I'm doing it here.

Feel free to instead drop PFEC.F in permission_fault() or even add a new
member to kvm_mmu with the valid bits of a PFEC.  This is just an RFC
for you and Huaitong to adapt in the PK patchset.  Sometimes things are
easier to describe in patches than in English. :)

Paolo

>>
>> Signed-off-by: Paolo Bonzini <pbonzini@...hat.com>
>> ---
>>   arch/x86/kvm/mmu.c         |  2 +-
>>   arch/x86/kvm/paging_tmpl.h | 26 +++++++++++++++-----------
>>   2 files changed, 16 insertions(+), 12 deletions(-)
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 2463de0b935c..e57f7be061e3 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -3883,7 +3883,7 @@ static void update_permission_bitmask(struct
>> kvm_vcpu *vcpu,
>>               u = bit & ACC_USER_MASK;
>>
>>               if (!ept) {
>> -                /* Not really needed: !nx will cause pte.nx to fault */
>> +                /* Not really needed: if !nx, ff will always be zero */
>>                   x |= !mmu->nx;
>>                   /* Allow supervisor writes if !cr0.wp */
>>                   w |= !is_write_protection(vcpu) && !uf;
>> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
>> index 6013f3685ef4..285858d3223b 100644
>> --- a/arch/x86/kvm/paging_tmpl.h
>> +++ b/arch/x86/kvm/paging_tmpl.h
>> @@ -272,13 +272,24 @@ static int FNAME(walk_addr_generic)(struct
>> guest_walker *walker,
>>       gpa_t pte_gpa;
>>       int offset;
>>       const int write_fault = access & PFERR_WRITE_MASK;
>> -    const int user_fault  = access & PFERR_USER_MASK;
>> -    const int fetch_fault = access & PFERR_FETCH_MASK;
>> -    u16 errcode = 0;
>> +    u16 errcode;
>>       gpa_t real_gpa;
>>       gfn_t gfn;
>>
>>       trace_kvm_mmu_pagetable_walk(addr, access);
>> +
>> +    /*
>> +     * Do not modify PFERR_FETCH_MASK in access.  It is used later in
>> the call to
>> +     * mmu->translate_gpa and, when nested virtualization is in use,
>> the X or NX
>> +     * bit of nested page tables always applies---even if NX and SMEP
>> are disabled
>> +     * in the guest.
>> +     *
>> +     * TODO: cache the result of the NX and SMEP test in struct kvm_mmu?
>> +     */
>> +    errcode = access;
>> +    if (!(mmu->nx || kvm_read_cr4_bits(vcpu, X86_CR4_SMEP)))
>> +        errcode &= ~PFERR_FETCH_MASK;
>> +
> 
> PFEC.P might be set since it is copied from @access.
> 
> And the workload is moved from rare path to the common path...
> 
>>   retry_walk:
>>       walker->level = mmu->root_level;
>>       pte           = mmu->get_cr3(vcpu);
>> @@ -389,9 +400,7 @@ retry_walk:
>>
>>       if (unlikely(!accessed_dirty)) {
>>           ret = FNAME(update_accessed_dirty_bits)(vcpu, mmu, walker,
>> write_fault);
>> -        if (unlikely(ret < 0))
>> -            goto error;
>> -        else if (ret)
>> +        if (ret > 0)
>>               goto retry_walk;
> 
> So it returns normally even if update_accessed_dirty_bits() failed.