linux-kernel - Re: [PATCH v2] vfio/type1: avoid redundant PageReserved checking

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190913120526.363e7592@x1.home>
Date:   Fri, 13 Sep 2019 12:05:26 -0600
From:   Alex Williamson <alex.williamson@...hat.com>
To:     Ben Luo <luoben@...ux.alibaba.com>
Cc:     cohuck@...hat.com, linux-kernel@...r.kernel.org,
        Andrea Arcangeli <aarcange@...hat.com>
Subject: Re: [PATCH v2] vfio/type1: avoid redundant PageReserved checking

On Mon, 2 Sep 2019 15:32:42 +0800
Ben Luo <luoben@...ux.alibaba.com> wrote:

> 在 2019/8/30 上午1:06, Alex Williamson 写道:
> > On Fri, 30 Aug 2019 00:58:22 +0800
> > Ben Luo <luoben@...ux.alibaba.com> wrote:
> >  
> >> 在 2019/8/28 下午11:55, Alex Williamson 写道:  
> >>> On Wed, 28 Aug 2019 12:28:04 +0800
> >>> Ben Luo <luoben@...ux.alibaba.com> wrote:
> >>>     
> >>>> currently, if the page is not a tail of compound page, it will be
> >>>> checked twice for the same thing.
> >>>>
> >>>> Signed-off-by: Ben Luo <luoben@...ux.alibaba.com>
> >>>> ---
> >>>>    drivers/vfio/vfio_iommu_type1.c | 3 +--
> >>>>    1 file changed, 1 insertion(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> >>>> index 054391f..d0f7346 100644
> >>>> --- a/drivers/vfio/vfio_iommu_type1.c
> >>>> +++ b/drivers/vfio/vfio_iommu_type1.c
> >>>> @@ -291,11 +291,10 @@ static int vfio_lock_acct(struct vfio_dma *dma, long npage, bool async)
> >>>>    static bool is_invalid_reserved_pfn(unsigned long pfn)
> >>>>    {
> >>>>    	if (pfn_valid(pfn)) {
> >>>> -		bool reserved;
> >>>>    		struct page *tail = pfn_to_page(pfn);
> >>>>    		struct page *head = compound_head(tail);
> >>>> -		reserved = !!(PageReserved(head));
> >>>>    		if (head != tail) {
> >>>> +			bool reserved = PageReserved(head);
> >>>>    			/*
> >>>>    			 * "head" is not a dangling pointer
> >>>>    			 * (compound_head takes care of that)  
> >>> Thinking more about this, the code here was originally just a copy of
> >>> kvm_is_mmio_pfn() which was simplified in v3.12 with the commit below.
> >>> Should we instead do the same thing here?  Thanks,
> >>>
> >>> Alex  
> >> ok, and kvm_is_mmio_pfn() has also been updated since then, I will take
> >> a look at that and compose a new patch  
> > I'm not sure if the further updates are quite as relevant for vfio, but
> > appreciate your review of them.  Thanks,
> >
> > Alex  
> 
> After studying the related code, my personal understandings are:
> 
> kvm_is_mmio_pfn() is used to find out whether a memory range is MMIO 
> mapped so that to set
> the proper MTRR TYPE to spte.
> 
> is_invalid_reserved_pfn() is used in two scenarios:
>      1. to tell whether a page should be counted against user's mlock 
> limits, as the function's name
> implies, all 'invalid' PFNs who are not backed by struct page and those 
> reserved pages (including
> zero page and those from NVDIMM DAX) should be excluded.
> 2. to check if we have got a valid and pinned pfn for the vma 
> with VM_PFNMAP flag.
> 
> So, for the zero page and 'RAM' backed PFNs without 'struct page', 
> kvm_is_mmio_pfn() should
> return false because they are not MMIO and are cacheable, but 
> is_invalid_reserved_pfn() should
> return true since they are truely reserved or invalid and should not be 
> counted against user's
> mlock limits.
> 
> For fsdax-page, current get_user_pages() returns -EOPNOTSUPP, and VFIO 
> also returns this
> error code to user, seems not support fsdax for now, so there is no 
> chance to call into
> is_invalid_reserved_pfn() currently, if fsdax is to be supported, not 
> only this function needs to be
> updated, vaddr_get_pfn() also needs some changes.
> 
> Now, with the assumption that PFNs of compound pages with reserved bit 
> set in head will not be
> passed to is_invalid_reserved_pfn(), we can simplify this function to:
> 
> static bool is_invalid_reserved_pfn(unsigned long pfn)
> {
>          if (pfn_valid(pfn))
>                  return PageReserved(pfn_to_page(pfn));
> 
>          return true;
> }
> 
> But, I am not sure if the assumption above is true, if not, we still 
> need to check the reserved bit of
> head for a tail page as this PATCH v2 does.

I believe what you've described is correct.  Andrea, have we missed
anything?  Thanks,

Alex


> >  
> >>> commit 11feeb498086a3a5907b8148bdf1786a9b18fc55
> >>> Author: Andrea Arcangeli <aarcange@...hat.com>
> >>> Date:   Thu Jul 25 03:04:38 2013 +0200
> >>>
> >>>       kvm: optimize away THP checks in kvm_is_mmio_pfn()
> >>>       
> >>>       The checks on PG_reserved in the page structure on head and tail pages
> >>>       aren't necessary because split_huge_page wouldn't transfer the
> >>>       PG_reserved bit from head to tail anyway.
> >>>       
> >>>       This was a forward-thinking check done in the case PageReserved was
> >>>       set by a driver-owned page mapped in userland with something like
> >>>       remap_pfn_range in a VM_PFNMAP region, but using hugepmds (not
> >>>       possible right now). It was meant to be very safe, but it's overkill
> >>>       as it's unlikely split_huge_page could ever run without the driver
> >>>       noticing and tearing down the hugepage itself.
> >>>       
> >>>       And if a driver in the future will really want to map a reserved
> >>>       hugepage in userland using an huge pmd it should simply take care of
> >>>       marking all subpages reserved too to keep KVM safe. This of course
> >>>       would require such a hypothetical driver to tear down the huge pmd
> >>>       itself and splitting the hugepage itself, instead of relaying on
> >>>       split_huge_page, but that sounds very reasonable, especially
> >>>       considering split_huge_page wouldn't currently transfer the reserved
> >>>       bit anyway.
> >>>       
> >>>       Signed-off-by: Andrea Arcangeli <aarcange@...hat.com>
> >>>       Signed-off-by: Gleb Natapov <gleb@...hat.com>
> >>>
> >>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> >>> index d2836788561e..0fc25aed79a8 100644
> >>> --- a/virt/kvm/kvm_main.c
> >>> +++ b/virt/kvm/kvm_main.c
> >>> @@ -102,28 +102,8 @@ static bool largepages_enabled = true;
> >>>    
> >>>    bool kvm_is_mmio_pfn(pfn_t pfn)
> >>>    {
> >>> -       if (pfn_valid(pfn)) {
> >>> -               int reserved;
> >>> -               struct page *tail = pfn_to_page(pfn);
> >>> -               struct page *head = compound_trans_head(tail);
> >>> -               reserved = PageReserved(head);
> >>> -               if (head != tail) {
> >>> -                       /*
> >>> -                        * "head" is not a dangling pointer
> >>> -                        * (compound_trans_head takes care of that)
> >>> -                        * but the hugepage may have been splitted
> >>> -                        * from under us (and we may not hold a
> >>> -                        * reference count on the head page so it can
> >>> -                        * be reused before we run PageReferenced), so
> >>> -                        * we've to check PageTail before returning
> >>> -                        * what we just read.
> >>> -                        */
> >>> -                       smp_rmb();
> >>> -                       if (PageTail(tail))
> >>> -                               return reserved;
> >>> -               }
> >>> -               return PageReserved(tail);
> >>> -       }
> >>> +       if (pfn_valid(pfn))
> >>> +               return PageReserved(pfn_to_page(pfn));
> >>>    
> >>>           return true;
> >>>    }