[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250704140431.GH1410929@nvidia.com>
Date: Fri, 4 Jul 2025 11:04:31 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: ankita@...dia.com, david@...hat.com
Cc: maz@...nel.org, oliver.upton@...ux.dev, joey.gouly@....com,
suzuki.poulose@....com, yuzenghui@...wei.com,
catalin.marinas@....com, will@...nel.org, ryan.roberts@....com,
shahuang@...hat.com, lpieralisi@...nel.org, ddutile@...hat.com,
seanjc@...gle.com, aniketa@...dia.com, cjia@...dia.com,
kwankhede@...dia.com, kjaju@...dia.com, targupta@...dia.com,
vsethi@...dia.com, acurrid@...dia.com, apopple@...dia.com,
jhubbard@...dia.com, danw@...dia.com, zhiw@...dia.com,
mochs@...dia.com, udhoke@...dia.com, dnigam@...dia.com,
alex.williamson@...hat.com, sebastianene@...gle.com,
coltonlewis@...gle.com, kevin.tian@...el.com, yi.l.liu@...el.com,
ardb@...nel.org, akpm@...ux-foundation.org, gshan@...hat.com,
linux-mm@...ck.org, tabba@...gle.com, qperret@...gle.com,
kvmarm@...ts.linux.dev, linux-kernel@...r.kernel.org,
linux-arm-kernel@...ts.infradead.org, maobibo@...ngson.cn
Subject: Re: [PATCH v9 5/6] KVM: arm64: Allow cacheable stage 2 mapping using
VMA flags
On Sat, Jun 21, 2025 at 04:21:10AM +0000, ankita@...dia.com wrote:
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1681,18 +1681,53 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> if (is_error_noslot_pfn(pfn))
> return -EFAULT;
>
> + /*
> + * Check if this is non-struct page memory PFN, and cannot support
> + * CMOs. It could potentially be unsafe to access as cachable.
> + */
> if (vm_flags & (VM_PFNMAP | VM_MIXEDMAP) && !pfn_is_map_memory(pfn)) {
> /*
> - * If the page was identified as device early by looking at
> - * the VMA flags, vma_pagesize is already representing the
> - * largest quantity we can map. If instead it was mapped
> - * via __kvm_faultin_pfn(), vma_pagesize is set to PAGE_SIZE
> - * and must not be upgraded.
> - *
> - * In both cases, we don't let transparent_hugepage_adjust()
> - * change things at the last minute.
> + * COW VM_PFNMAP is possible when doing a MAP_PRIVATE
> + * /dev/mem mapping on systems that allow such mapping.
> + * Reject such case.
> */
> - s2_force_noncacheable = true;
> + if (is_cow_mapping(vm_flags))
> + return -EINVAL;
I still would like an explanation why we need to block this.
COW PFNMAP is like MIXEDMAP, you end up with a VMA where there is a
mixture of MMIO and normal pages. Arguably you are supposed to use
vm_normal_page() not pfn_is_map_memory(), but that seems difficult for
KVM.
Given we exclude the cachable case with the pfn_is_map_memory() we
know this is the non-struct page memory already, so why do we need to
block the COW?
I think the basic rule we are going for is that within the VMA the
non-normal/special PTE have to follow the vma->vm_pgprot while the
normal pages have to be cachable.
So if we find a normal page (ie pfn_is_map_memory()) then we know it
is cachable and s2_force_noncacheable = false. Otherwise we use the
vm_pgprot to decide if the special PTE is cachable.
David can you think of any reason to have this is_cow_mapping() test?
> + if (is_vma_cacheable) {
> + /*
> + * Whilst the VMA owner expects cacheable mapping to this
> + * PFN, hardware also has to support the FWB and CACHE DIC
> + * features.
> + *
> + * ARM64 KVM relies on kernel VA mapping to the PFN to
> + * perform cache maintenance as the CMO instructions work on
> + * virtual addresses. VM_PFNMAP region are not necessarily
> + * mapped to a KVA and hence the presence of hardware features
> + * S2FWB and CACHE DIC are mandatory for cache maintenance.
> + *
> + * Check if the hardware supports it before allowing the VMA
> + * owner request for cacheable mapping.
> + */
> + if (!kvm_arch_supports_cacheable_pfnmap())
> + return -EFAULT;
> +
> + /* Cannot degrade cachable to non cachable */
> + if (s2_force_noncacheable)
> + return -EINVAL;
What am I missing? After the whole series is applied this is the first
reference to s2_force_noncacheable after it is initialized to
false. So this can't happen?
> + } else {
> + /*
> + * If the page was identified as device early by looking at
> + * the VMA flags, vma_pagesize is already representing the
> + * largest quantity we can map. If instead it was mapped
> + * via __kvm_faultin_pfn(), vma_pagesize is set to PAGE_SIZE
> + * and must not be upgraded.
> + *
> + * In both cases, we don't let transparent_hugepage_adjust()
> + * change things at the last minute.
> + */
> + s2_force_noncacheable = true;
> + }
Then this logic that immediately follows:
if (is_vma_cacheable && s2_force_noncacheable)
return -EINVAL;
Doesn't make alot of sense either, the only cases that set
s2_force_noncacheable=true are the else block of 'if (is_vma_cacheable)'
so this is dead code too.
Seems like this still needs some cleanup to remove these impossible
conditions. The logic make sense to me otherwise though.
Jason
Powered by blists - more mailing lists