linux-kernel - Re: [RFC PATCH 4/4] KVM: arm64: Distinguish cases of memcache allocations completely

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <24b3f625-4d9c-cabe-5758-cfa9a1a2ce18@huawei.com>
Date:   Fri, 26 Mar 2021 09:24:58 +0800
From:   "wangyanan (Y)" <wangyanan55@...wei.com>
To:     Alexandru Elisei <alexandru.elisei@....com>,
        Marc Zyngier <maz@...nel.org>, Will Deacon <will@...nel.org>,
        Catalin Marinas <catalin.marinas@....com>,
        James Morse <james.morse@....com>,
        Julien Thierry <julien.thierry.kdev@...il.com>,
        Suzuki K Poulose <suzuki.poulose@....com>,
        Gavin Shan <gshan@...hat.com>,
        Quentin Perret <qperret@...gle.com>,
        <kvmarm@...ts.cs.columbia.edu>,
        <linux-arm-kernel@...ts.infradead.org>, <kvm@...r.kernel.org>,
        <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH 4/4] KVM: arm64: Distinguish cases of memcache
 allocations completely

Hi Alex,

On 2021/3/26 1:26, Alexandru Elisei wrote:
> Hi Yanan,
>
> On 2/8/21 11:22 AM, Yanan Wang wrote:
>> With a guest translation fault, the memcache pages are not needed if KVM
>> is only about to install a new leaf entry into the existing page table.
>> And with a guest permission fault, the memcache pages are also not needed
>> for a write_fault in dirty-logging time if KVM is only about to update
>> the existing leaf entry instead of collapsing a block entry into a table.
>>
>> By comparing fault_granule and vma_pagesize, cases that require allocations
>> from memcache and cases that don't can be distinguished completely.
>>
>> Signed-off-by: Yanan Wang <wangyanan55@...wei.com>
>> ---
>>   arch/arm64/kvm/mmu.c | 25 ++++++++++++-------------
>>   1 file changed, 12 insertions(+), 13 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>> index d151927a7d62..550498a9104e 100644
>> --- a/arch/arm64/kvm/mmu.c
>> +++ b/arch/arm64/kvm/mmu.c
>> @@ -815,19 +815,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>   	gfn = fault_ipa >> PAGE_SHIFT;
>>   	mmap_read_unlock(current->mm);
>>   
>> -	/*
>> -	 * Permission faults just need to update the existing leaf entry,
>> -	 * and so normally don't require allocations from the memcache. The
>> -	 * only exception to this is when dirty logging is enabled at runtime
>> -	 * and a write fault needs to collapse a block entry into a table.
>> -	 */
>> -	if (fault_status != FSC_PERM || (logging_active && write_fault)) {
>> -		ret = kvm_mmu_topup_memory_cache(memcache,
>> -						 kvm_mmu_cache_min_pages(kvm));
>> -		if (ret)
>> -			return ret;
>> -	}
>> -
>>   	mmu_seq = vcpu->kvm->mmu_notifier_seq;
>>   	/*
>>   	 * Ensure the read of mmu_notifier_seq happens before we call
>> @@ -887,6 +874,18 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>   	else if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC))
>>   		prot |= KVM_PGTABLE_PROT_X;
>>   
>> +	/*
>> +	 * Allocations from the memcache are required only when granule of the
>> +	 * lookup level where the guest fault happened exceeds vma_pagesize,
>> +	 * which means new page tables will be created in the fault handlers.
>> +	 */
>> +	if (fault_granule > vma_pagesize) {
>> +		ret = kvm_mmu_topup_memory_cache(memcache,
>> +						 kvm_mmu_cache_min_pages(kvm));
>> +		if (ret)
>> +			return ret;
>> +	}
> I distinguish three situations:
>
> 1. fault_granule == vma_pagesize. If the stage 2 fault occurs at the leaf level,
> then it means that all the tables that the translation table walker traversed
> until the leaf are valid. No need to allocate a new page, as stage 2 will only
> change the leaf to point to a valid PA.
>
> 2. fault_granule > vma_pagesize. This means that there's a table missing at some
> point in the table walk, so we're going to need to allocate at least one table to
> hold the leaf entry. We need to topup the memory cache.
>
> 3. fault_granule < vma_pagesize. From our discussion in patch #3, this can happen
> only if the userspace translation tables use a block mapping, dirty page logging
> is enabled, the fault_ipa is mapped as a last level entry, dirty page logging gets
> disabled and then we get a fault. In this case, the PTE table will be coalesced
> into a PMD block mapping, and the PMD table entry that pointed to the PTE table
> will be changed to a block mapping. No table will be allocated.
>
> Looks to me like this patch is valid, but getting it wrong can break a VM and I
> would feel a lot more comfortable if someone who is more familiar with the code
> would have a look.
Thanks for your explanation here. Above is also what I thought about 
this patch.

Thanks,
Yanan
>
> Thanks,
>
> Alex
>
>> +
>>   	/*
>>   	 * Under the premise of getting a FSC_PERM fault, we just need to relax
>>   	 * permissions only if vma_pagesize equals fault_granule. Otherwise,
> .