linux-kernel - Re: [RFC PATCH 2/3] KVM: arm64: Fix handling of merging tables into a block entry

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <67e9e393-1836-eca7-4235-6f4a19fed652@huawei.com>
Date:   Mon, 30 Nov 2020 23:24:19 +0800
From:   "wangyanan (Y)" <wangyanan55@...wei.com>
To:     Will Deacon <will@...nel.org>
CC:     <linux-kernel@...r.kernel.org>,
        <linux-arm-kernel@...ts.infradead.org>,
        Marc Zyngier <maz@...nel.org>,
        Catalin Marinas <catalin.marinas@....com>,
        James Morse <james.morse@....com>,
        Julien Thierry <julien.thierry.kdev@...il.com>,
        Suzuki K Poulose <suzuki.poulose@....com>,
        Gavin Shan <gshan@...hat.com>,
        Quentin Perret <qperret@...gle.com>,
        <wanghaibin.wang@...wei.com>, <yezengruan@...wei.com>,
        <zhukeqian1@...wei.com>, <yuzenghui@...wei.com>,
        <jiangkunkun@...wei.com>, <wangjingyi11@...wei.com>,
        <lushenming@...wei.com>
Subject: Re: [RFC PATCH 2/3] KVM: arm64: Fix handling of merging tables into a
 block entry


On 2020/11/30 21:34, Will Deacon wrote:
> On Mon, Nov 30, 2020 at 08:18:46PM +0800, Yanan Wang wrote:
>> In dirty logging case(logging_active == True), we need to collapse a block
>> entry into a table if necessary. After dirty logging is canceled, when merging
>> tables back into a block entry, we should not only free the non-huge page
>> tables but also unmap the non-huge mapping for the block. Without the unmap,
>> inconsistent TLB entries for the pages in the the block will be created.
>>
>> We could also use unmap_stage2_range API to unmap the non-huge mapping,
>> but this could potentially free the upper level page-table page which
>> will be useful later.
>>
>> Signed-off-by: Yanan Wang <wangyanan55@...wei.com>
>> ---
>>   arch/arm64/kvm/hyp/pgtable.c | 15 +++++++++++++--
>>   1 file changed, 13 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
>> index 696b6aa83faf..fec8dc9f2baa 100644
>> --- a/arch/arm64/kvm/hyp/pgtable.c
>> +++ b/arch/arm64/kvm/hyp/pgtable.c
>> @@ -500,6 +500,9 @@ static int stage2_map_walk_table_pre(u64 addr, u64 end, u32 level,
>>   	return 0;
>>   }
>>   
>> +static void stage2_flush_dcache(void *addr, u64 size);
>> +static bool stage2_pte_cacheable(kvm_pte_t pte);
>> +
>>   static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>>   				struct stage2_map_data *data)
>>   {
>> @@ -507,9 +510,17 @@ static int stage2_map_walk_leaf(u64 addr, u64 end, u32 level, kvm_pte_t *ptep,
>>   	struct page *page = virt_to_page(ptep);
>>   
>>   	if (data->anchor) {
>> -		if (kvm_pte_valid(pte))
>> +		if (kvm_pte_valid(pte)) {
>> +			kvm_set_invalid_pte(ptep);
>> +			kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, data->mmu,
>> +				     addr, level);
>>   			put_page(page);
> This doesn't make sense to me: the page-table pages we're walking when the
> anchor is set are not accessible to the hardware walker because we unhooked
> the entire sub-table in stage2_map_walk_table_pre(), which has the necessary
> TLB invalidation.
>
> Are you seeing a problem in practice here?

Yes, I indeed find a problem in practice.

When the migration was cancelled, a TLB conflic abort  was found in guest.

This problem is fixed before rework of the page table code, you can have 
a look in the following two links:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3c3736cd32bf5197aed1410ae826d2d254a5b277

https://lists.cs.columbia.edu/pipermail/kvmarm/2019-March/035031.html

>> +			if (stage2_pte_cacheable(pte))
>> +				stage2_flush_dcache(kvm_pte_follow(pte),
>> +						    kvm_granule_size(level));
> I don't understand the need for the flush either, as we're just coalescing
> existing entries into a larger block mapping.

In my opinion, after unmapping, it is necessary to ensure the cache 
coherency, because it is unknown whether the future mapping memory 
attribute is changed or not (cacheable -> non_cacheable) theoretically.

> Will
> .