lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Yw0/w0u+4qBHyy5u@monkey>
Date:   Mon, 29 Aug 2022 15:37:55 -0700
From:   Mike Kravetz <mike.kravetz@...cle.com>
To:     Miaohe Lin <linmiaohe@...wei.com>
Cc:     Muchun Song <songmuchun@...edance.com>,
        David Hildenbrand <david@...hat.com>,
        Michal Hocko <mhocko@...e.com>, Peter Xu <peterx@...hat.com>,
        Naoya Horiguchi <naoya.horiguchi@...ux.dev>,
        "Aneesh Kumar K . V" <aneesh.kumar@...ux.vnet.ibm.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
        Davidlohr Bueso <dave@...olabs.net>,
        Prakash Sangappa <prakash.sangappa@...cle.com>,
        James Houghton <jthoughton@...gle.com>,
        Mina Almasry <almasrymina@...gle.com>,
        Pasha Tatashin <pasha.tatashin@...een.com>,
        Axel Rasmussen <axelrasmussen@...gle.com>,
        Ray Fucillo <Ray.Fucillo@...ersystems.com>,
        Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 7/8] hugetlb: create hugetlb_unmap_file_folio to unmap
 single file folio

On 08/29/22 10:44, Miaohe Lin wrote:
> On 2022/8/25 1:57, Mike Kravetz wrote:
> > Create the new routine hugetlb_unmap_file_folio that will unmap a single
> > file folio.  This is refactored code from hugetlb_vmdelete_list.  It is
> > modified to do locking within the routine itself and check whether the
> > page is mapped within a specific vma before unmapping.
> > 
> > This refactoring will be put to use and expanded upon in a subsequent
> > patch adding vma specific locking.
> > 
> > Signed-off-by: Mike Kravetz <mike.kravetz@...cle.com>
> > ---
> >  fs/hugetlbfs/inode.c | 123 +++++++++++++++++++++++++++++++++----------
> >  1 file changed, 94 insertions(+), 29 deletions(-)
> > 
> > diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> > index e83fd31671b3..b93d131b0cb5 100644
> > --- a/fs/hugetlbfs/inode.c
> > +++ b/fs/hugetlbfs/inode.c
> > @@ -371,6 +371,94 @@ static void hugetlb_delete_from_page_cache(struct page *page)
> >  	delete_from_page_cache(page);
> >  }
> >  
> > +/*
> > + * Called with i_mmap_rwsem held for inode based vma maps.  This makes
> > + * sure vma (and vm_mm) will not go away.  We also hold the hugetlb fault
> > + * mutex for the page in the mapping.  So, we can not race with page being
> > + * faulted into the vma.
> > + */
> > +static bool hugetlb_vma_maps_page(struct vm_area_struct *vma,
> > +				unsigned long addr, struct page *page)
> > +{
> > +	pte_t *ptep, pte;
> > +
> > +	ptep = huge_pte_offset(vma->vm_mm, addr,
> > +			huge_page_size(hstate_vma(vma)));
> > +
> > +	if (!ptep)
> > +		return false;
> > +
> > +	pte = huge_ptep_get(ptep);
> > +	if (huge_pte_none(pte) || !pte_present(pte))
> > +		return false;
> > +
> > +	if (pte_page(pte) == page)
> > +		return true;
> 
> I'm thinking whether pte entry could change after we check it since huge_pte_lock is not held here.
> But I think holding i_mmap_rwsem in writelock mode should give us such a guarantee, e.g. migration
> entry is changed back to huge pte entry while holding i_mmap_rwsem in readlock mode.
> Or am I miss something?

Let me think about this.  I do not think it is possible, but you ask good
questions.

Do note that this is the same locking sequence used at the beginning of the
page fault code where the decision to call hugetlb_no_page() is made.

> 
> > +
> > +	return false;
> > +}
> > +
> > +/*
> > + * Can vma_offset_start/vma_offset_end overflow on 32-bit arches?
> > + * No, because the interval tree returns us only those vmas
> > + * which overlap the truncated area starting at pgoff,
> > + * and no vma on a 32-bit arch can span beyond the 4GB.
> > + */
> > +static unsigned long vma_offset_start(struct vm_area_struct *vma, pgoff_t start)
> > +{
> > +	if (vma->vm_pgoff < start)
> > +		return (start - vma->vm_pgoff) << PAGE_SHIFT;
> > +	else
> > +		return 0;
> > +}
> > +
> > +static unsigned long vma_offset_end(struct vm_area_struct *vma, pgoff_t end)
> > +{
> > +	unsigned long t_end;
> > +
> > +	if (!end)
> > +		return vma->vm_end;
> > +
> > +	t_end = ((end - vma->vm_pgoff) << PAGE_SHIFT) + vma->vm_start;
> > +	if (t_end > vma->vm_end)
> > +		t_end = vma->vm_end;
> > +	return t_end;
> > +}
> > +
> > +/*
> > + * Called with hugetlb fault mutex held.  Therefore, no more mappings to
> > + * this folio can be created while executing the routine.
> > + */
> > +static void hugetlb_unmap_file_folio(struct hstate *h,
> > +					struct address_space *mapping,
> > +					struct folio *folio, pgoff_t index)
> > +{
> > +	struct rb_root_cached *root = &mapping->i_mmap;
> > +	struct page *page = &folio->page;
> > +	struct vm_area_struct *vma;
> > +	unsigned long v_start;
> > +	unsigned long v_end;
> > +	pgoff_t start, end;
> > +
> > +	start = index * pages_per_huge_page(h);
> > +	end = ((index + 1) * pages_per_huge_page(h));
> 
> It seems the outer parentheses is unneeded?

Correct.  Thanks.
-- 
Mike Kravetz

> 
> Reviewed-by: Miaohe Lin <linmiaohe@...wei.com>
> 
> Thanks,
> Miaohe Lin

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ