linux-kernel - Re: [PATCH v10 2/6] arm64: kvm: Introduce MTE VM feature

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8977120b-841d-4882-2472-6e403bc9c797@redhat.com>
Date:   Wed, 31 Mar 2021 09:34:44 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Catalin Marinas <catalin.marinas@....com>,
        Steven Price <steven.price@....com>
Cc:     Mark Rutland <mark.rutland@....com>,
        Peter Maydell <peter.maydell@...aro.org>,
        "Dr. David Alan Gilbert" <dgilbert@...hat.com>,
        Andrew Jones <drjones@...hat.com>, Haibo Xu <Haibo.Xu@....com>,
        Suzuki K Poulose <suzuki.poulose@....com>,
        qemu-devel@...gnu.org, Marc Zyngier <maz@...nel.org>,
        Juan Quintela <quintela@...hat.com>,
        Richard Henderson <richard.henderson@...aro.org>,
        linux-kernel@...r.kernel.org, Dave Martin <Dave.Martin@....com>,
        James Morse <james.morse@....com>,
        linux-arm-kernel@...ts.infradead.org,
        Thomas Gleixner <tglx@...utronix.de>,
        Will Deacon <will@...nel.org>, kvmarm@...ts.cs.columbia.edu,
        Julien Thierry <julien.thierry.kdev@...il.com>
Subject: Re: [PATCH v10 2/6] arm64: kvm: Introduce MTE VM feature

On 30.03.21 12:30, Catalin Marinas wrote:
> On Mon, Mar 29, 2021 at 05:06:51PM +0100, Steven Price wrote:
>> On 28/03/2021 13:21, Catalin Marinas wrote:
>>> On Sat, Mar 27, 2021 at 03:23:24PM +0000, Catalin Marinas wrote:
>>>> On Fri, Mar 12, 2021 at 03:18:58PM +0000, Steven Price wrote:
>>>>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>>>>> index 77cb2d28f2a4..b31b7a821f90 100644
>>>>> --- a/arch/arm64/kvm/mmu.c
>>>>> +++ b/arch/arm64/kvm/mmu.c
>>>>> @@ -879,6 +879,22 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>>>>>    	if (vma_pagesize == PAGE_SIZE && !force_pte)
>>>>>    		vma_pagesize = transparent_hugepage_adjust(memslot, hva,
>>>>>    							   &pfn, &fault_ipa);
>>>>> +
>>>>> +	if (fault_status != FSC_PERM && kvm_has_mte(kvm) && pfn_valid(pfn)) {
>>>>> +		/*
>>>>> +		 * VM will be able to see the page's tags, so we must ensure
>>>>> +		 * they have been initialised. if PG_mte_tagged is set, tags
>>>>> +		 * have already been initialised.
>>>>> +		 */
>>>>> +		struct page *page = pfn_to_page(pfn);
>>>>> +		unsigned long i, nr_pages = vma_pagesize >> PAGE_SHIFT;
>>>>> +
>>>>> +		for (i = 0; i < nr_pages; i++, page++) {
>>>>> +			if (!test_and_set_bit(PG_mte_tagged, &page->flags))
>>>>> +				mte_clear_page_tags(page_address(page));
>>>>> +		}
>>>>> +	}
>>>>
>>>> This pfn_valid() check may be problematic. Following commit eeb0753ba27b
>>>> ("arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory"), it returns
>>>> true for ZONE_DEVICE memory but such memory is allowed not to support
>>>> MTE.
>>>
>>> Some more thinking, this should be safe as any ZONE_DEVICE would be
>>> mapped as untagged memory in the kernel linear map. It could be slightly
>>> inefficient if it unnecessarily tries to clear tags in ZONE_DEVICE,
>>> untagged memory. Another overhead is pfn_valid() which will likely end
>>> up calling memblock_is_map_memory().
>>>
>>> However, the bigger issue is that Stage 2 cannot disable tagging for
>>> Stage 1 unless the memory is Non-cacheable or Device at S2. Is there a
>>> way to detect what gets mapped in the guest as Normal Cacheable memory
>>> and make sure it's only early memory or hotplug but no ZONE_DEVICE (or
>>> something else like on-chip memory)?  If we can't guarantee that all
>>> Cacheable memory given to a guest supports tags, we should disable the
>>> feature altogether.
>>
>> In stage 2 I believe we only have two types of mapping - 'normal' or
>> DEVICE_nGnRE (see stage2_map_set_prot_attr()). Filtering out the latter is a
>> case of checking the 'device' variable, and makes sense to avoid the
>> overhead you describe.
>>
>> This should also guarantee that all stage-2 cacheable memory supports tags,
>> as kvm_is_device_pfn() is simply !pfn_valid(), and pfn_valid() should only
>> be true for memory that Linux considers "normal".

If you think "normal" == "normal System RAM", that's wrong; see below.

> 
> That's the problem. With Anshuman's commit I mentioned above,
> pfn_valid() returns true for ZONE_DEVICE mappings (e.g. persistent
> memory, not talking about some I/O mapping that requires Device_nGnRE).
> So kvm_is_device_pfn() is false for such memory and it may be mapped as
> Normal but it is not guaranteed to support tagging.

pfn_valid() means "there is a struct page"; if you do pfn_to_page() and 
touch the page, you won't fault. So Anshuman's commit is correct.

pfn_to_online_page() means, "there is a struct page and it's system RAM 
that's in use; the memmap has a sane content"

> 
> For user MTE, we get away with this as the MAP_ANONYMOUS requirement
> would filter it out while arch_add_memory() will ensure it's mapped as
> untagged in the linear map. See another recent fix for hotplugged
> memory: d15dfd31384b ("arm64: mte: Map hotplugged memory as Normal
> Tagged"). We needed to ensure that ZONE_DEVICE doesn't end up as tagged,
> only hoplugged memory. Both handled via arch_add_memory() in the arch
> code with ZONE_DEVICE starting at devm_memremap_pages().
> 
>>>> I now wonder if we can get a MAP_ANONYMOUS mapping of ZONE_DEVICE pfn
>>>> even without virtualisation.
>>>
>>> I haven't checked all the code paths but I don't think we can get a
>>> MAP_ANONYMOUS mapping of ZONE_DEVICE memory as we normally need a file
>>> descriptor.
>>
>> I certainly hope this is the case - it's the weird corner cases of device
>> drivers that worry me. E.g. I know i915 has a "hidden" mmap behind an ioctl
>> (see i915_gem_mmap_ioctl(), although this case is fine - it's MAP_SHARED).
>> Mali's kbase did something similar in the past.
> 
> I think this should be fine since it's not a MAP_ANONYMOUS (we do allow
> MAP_SHARED to be tagged).
> 


-- 
Thanks,

David / dhildenb