[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87a6py2ss9.wl-maz@kernel.org>
Date: Fri, 16 Apr 2021 15:44:22 +0100
From: Marc Zyngier <maz@...nel.org>
To: Keqian Zhu <zhukeqian1@...wei.com>
Cc: <linux-kernel@...r.kernel.org>,
<linux-arm-kernel@...ts.infradead.org>, <kvm@...r.kernel.org>,
<kvmarm@...ts.cs.columbia.edu>, <wanghaibin.wang@...wei.com>
Subject: Re: [PATCH v4 2/2] kvm/arm64: Try stage2 block mapping for host device MMIO
On Thu, 15 Apr 2021 15:08:09 +0100,
Keqian Zhu <zhukeqian1@...wei.com> wrote:
>
> Hi Marc,
>
> On 2021/4/15 22:03, Keqian Zhu wrote:
> > The MMIO region of a device maybe huge (GB level), try to use
> > block mapping in stage2 to speedup both map and unmap.
> >
> > Compared to normal memory mapping, we should consider two more
> > points when try block mapping for MMIO region:
> >
> > 1. For normal memory mapping, the PA(host physical address) and
> > HVA have same alignment within PUD_SIZE or PMD_SIZE when we use
> > the HVA to request hugepage, so we don't need to consider PA
> > alignment when verifing block mapping. But for device memory
> > mapping, the PA and HVA may have different alignment.
> >
> > 2. For normal memory mapping, we are sure hugepage size properly
> > fit into vma, so we don't check whether the mapping size exceeds
> > the boundary of vma. But for device memory mapping, we should pay
> > attention to this.
> >
> > This adds get_vma_page_shift() to get page shift for both normal
> > memory and device MMIO region, and check these two points when
> > selecting block mapping size for MMIO region.
> >
> > Signed-off-by: Keqian Zhu <zhukeqian1@...wei.com>
> > ---
> > arch/arm64/kvm/mmu.c | 61 ++++++++++++++++++++++++++++++++++++--------
> > 1 file changed, 51 insertions(+), 10 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index c59af5ca01b0..5a1cc7751e6d 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -738,6 +738,35 @@ transparent_hugepage_adjust(struct kvm_memory_slot *memslot,
> > return PAGE_SIZE;
> > }
> >
> > +static int get_vma_page_shift(struct vm_area_struct *vma, unsigned long hva)
> > +{
> > + unsigned long pa;
> > +
> > + if (is_vm_hugetlb_page(vma) && !(vma->vm_flags & VM_PFNMAP))
> > + return huge_page_shift(hstate_vma(vma));
> > +
> > + if (!(vma->vm_flags & VM_PFNMAP))
> > + return PAGE_SHIFT;
> > +
> > + VM_BUG_ON(is_vm_hugetlb_page(vma));
> > +
> > + pa = (vma->vm_pgoff << PAGE_SHIFT) + (hva - vma->vm_start);
> > +
> > +#ifndef __PAGETABLE_PMD_FOLDED
> > + if ((hva & (PUD_SIZE - 1)) == (pa & (PUD_SIZE - 1)) &&
> > + ALIGN_DOWN(hva, PUD_SIZE) >= vma->vm_start &&
> > + ALIGN(hva, PUD_SIZE) <= vma->vm_end)
> > + return PUD_SHIFT;
> > +#endif
> > +
> > + if ((hva & (PMD_SIZE - 1)) == (pa & (PMD_SIZE - 1)) &&
> > + ALIGN_DOWN(hva, PMD_SIZE) >= vma->vm_start &&
> > + ALIGN(hva, PMD_SIZE) <= vma->vm_end)
> > + return PMD_SHIFT;
> > +
> > + return PAGE_SHIFT;
> > +}
> > +
> > static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> > struct kvm_memory_slot *memslot, unsigned long hva,
> > unsigned long fault_status)
> > @@ -769,7 +798,10 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> > return -EFAULT;
> > }
> >
> > - /* Let's check if we will get back a huge page backed by hugetlbfs */
> > + /*
> > + * Let's check if we will get back a huge page backed by hugetlbfs, or
> > + * get block mapping for device MMIO region.
> > + */
> > mmap_read_lock(current->mm);
> > vma = find_vma_intersection(current->mm, hva, hva + 1);
> > if (unlikely(!vma)) {
> > @@ -778,15 +810,15 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> > return -EFAULT;
> > }
> >
> > - if (is_vm_hugetlb_page(vma))
> > - vma_shift = huge_page_shift(hstate_vma(vma));
> > - else
> > - vma_shift = PAGE_SHIFT;
> > -
> > - if (logging_active ||
> > - (vma->vm_flags & VM_PFNMAP)) {
> > + /*
> > + * logging_active is guaranteed to never be true for VM_PFNMAP
> > + * memslots.
> > + */
> > + if (logging_active) {
> > force_pte = true;
> > vma_shift = PAGE_SHIFT;
> > + } else {
> > + vma_shift = get_vma_page_shift(vma, hva);
> > }
> I use a if/else manner in v4, please check that. Thanks very much!
That's fine. However, it is getting a bit late for 5.13, and we don't
have much time to left it simmer in -next. I'll probably wait until
after the merge window to pick it up.
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
Powered by blists - more mailing lists