[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZPogsU6YihY2+qR6@nvidia.com>
Date: Thu, 7 Sep 2023 16:12:49 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: ankita@...dia.com
Cc: maz@...nel.org, oliver.upton@...ux.dev, catalin.marinas@....com,
will@...nel.org, aniketa@...dia.com, cjia@...dia.com,
kwankhede@...dia.com, targupta@...dia.com, vsethi@...dia.com,
acurrid@...dia.com, apopple@...dia.com, jhubbard@...dia.com,
danw@...dia.com, linux-arm-kernel@...ts.infradead.org,
kvmarm@...ts.linux.dev, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1 1/2] KVM: arm64: determine memory type from VMA
On Thu, Sep 07, 2023 at 11:14:58AM -0700, ankita@...dia.com wrote:
> From: Ankit Agrawal <ankita@...dia.com>
>
> Currently KVM determines if a VMA is pointing at IO memory by checking
> pfn_is_map_memory(). However, the MM already gives us a way to tell what
> kind of memory it is by inspecting the VMA.
>
> Replace pfn_is_map_memory() with a check on the VMA pgprot to determine if
> the memory is IO and thus needs stage-2 device mapping.
>
> The VMA's pgprot is tested to determine the memory type with the
> following mapping:
>
> pgprot_noncached MT_DEVICE_nGnRnE device
> pgprot_writecombine MT_NORMAL_NC device
> pgprot_device MT_DEVICE_nGnRE device
> pgprot_tagged MT_NORMAL_TAGGED RAM
>
> This patch solves a problems where it is possible for the kernel to
> have VMAs pointing at cachable memory without causing
> pfn_is_map_memory() to be true, eg DAX memremap cases and CXL/pre-CXL
> devices. This memory is now properly marked as cachable in KVM.
>
> Unfortunately when FWB is not enabled, the kernel expects to naively do
> cache management by flushing the memory using an address in the
> kernel's map. This does not work in several of the newly allowed
> cases such as dcache_clean_inval_poc(). Check whether the targeted pfn
> and its mapping KVA is valid in case the FWB is absent before
> continuing.
Looking at this more closely, it relies on kvm_pte_follow() which
ultimately calls __va(), and the ARM 64 version of page_to_virt() is:
#define page_to_virt(x) ({ \
__typeof__(x) __page = x; \
void *__addr = __va(page_to_phys(__page)); \
(void *)__tag_set((const void *)__addr, page_kasan_tag(__page));\
})
So we don't actually have an issue here, anything with a struct page
will be properly handled by KVM.
Thus this can just be:
if (!stage2_has_fwb(pgt) && !pfn_valid(pfn)))
Then the last paragraph of the commit message is:
As cachable vs device memory is now determined by the VMA adjust
the other checks to match their purpose. In most cases the code needs
to know if there is a struct page, or if the memory is in the kernel
map and pfn_valid() is an appropriate API for this.
Note when FWB is not enabled, the kernel expects to trivially do
cache management by flushing the memory by linearly converting a
kvm_pte to phys_addr to a KVA, see kvm_flush_dcache_to_poc(). This is
only possibile for struct page backed memory. Do not allow non-struct
page memory to be cachable without FWB.
> @@ -1490,6 +1499,18 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> gfn = fault_ipa >> PAGE_SHIFT;
> mte_allowed = kvm_vma_mte_allowed(vma);
>
> + /*
> + * Figure out the memory type based on the user va mapping properties
> + * Only MT_DEVICE_nGnRE and MT_DEVICE_nGnRnE will be set using
> + * pgprot_device() and pgprot_noncached() respectively.
> + */
> + if ((mapping_type(vma->vm_page_prot) == MT_DEVICE_nGnRE) ||
> + (mapping_type(vma->vm_page_prot) == MT_DEVICE_nGnRnE) ||
> + (mapping_type(vma->vm_page_prot) == MT_NORMAL_NC))
> + prot |= KVM_PGTABLE_PROT_DEVICE;
> + else if (cpus_have_const_cap(ARM64_HAS_CACHE_DIC))
> + prot |= KVM_PGTABLE_PROT_X;
> +
> /* Don't use the VMA after the unlock -- it may have vanished */
> vma = NULL;
>
> @@ -1576,10 +1597,21 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> if (exec_fault)
> prot |= KVM_PGTABLE_PROT_X;
You still didn't remove the kvm_is_device_pfn() check from this code,
I don't think it can really stay and make any sense..
Probably this:
if (exec_fault && (prot & KVM_PGTABLE_PROT_DEVICE))
return -ENOEXEC;
And these two should also be pfn_valid() [precompute pfn_valid]
if (vma_pagesize == PAGE_SIZE && !(force_pte || !pfn_valid(pte))) {
if (fault_status != ESR_ELx_FSC_PERM && pfn_valid(pte) && kvm_has_mte(kvm)) {
Makes sense?
Check if kvm_is_device_pfn() can be removed entirely.
Jason
Powered by blists - more mailing lists