lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YuGLT+RlCynFM9Z4@monkey>
Date:   Wed, 27 Jul 2022 12:00:31 -0700
From:   Mike Kravetz <mike.kravetz@...cle.com>
To:     Miaohe Lin <linmiaohe@...wei.com>
Cc:     linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        Muchun Song <songmuchun@...edance.com>,
        Michal Hocko <mhocko@...e.com>, Peter Xu <peterx@...hat.com>,
        Naoya Horiguchi <naoya.horiguchi@...ux.dev>,
        David Hildenbrand <david@...hat.com>,
        "Aneesh Kumar K . V" <aneesh.kumar@...ux.vnet.ibm.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
        Davidlohr Bueso <dave@...olabs.net>,
        Prakash Sangappa <prakash.sangappa@...cle.com>,
        James Houghton <jthoughton@...gle.com>,
        Mina Almasry <almasrymina@...gle.com>,
        Pasha Tatashin <pasha.tatashin@...een.com>,
        Axel Rasmussen <axelrasmussen@...gle.com>,
        Ray Fucillo <Ray.Fucillo@...ersystems.com>,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [RFC PATCH v4 4/8] hugetlbfs: catch and handle truncate racing
 with page faults

On 07/27/22 17:20, Miaohe Lin wrote:
> On 2022/7/7 4:23, Mike Kravetz wrote:
> > Most hugetlb fault handling code checks for faults beyond i_size.
> > While there are early checks in the code paths, the most difficult
> > to handle are those discovered after taking the page table lock.
> > At this point, we have possibly allocated a page and consumed
> > associated reservations and possibly added the page to the page cache.
> > 
> > When discovering a fault beyond i_size, be sure to:
> > - Remove the page from page cache, else it will sit there until the
> >   file is removed.
> > - Do not restore any reservation for the page consumed.  Otherwise
> >   there will be an outstanding reservation for an offset beyond the
> >   end of file.
> > 
> > The 'truncation' code in remove_inode_hugepages must deal with fault
> > code potentially removing a page/folio from the cache after the page was
> > returned by filemap_get_folios and before locking the page.  This can be
> > discovered by a change in folio_mapping() after taking folio lock.  In
> > addition, this code must deal with fault code potentially consuming
> > and returning reservations.  To synchronize this, remove_inode_hugepages
> > will now take the fault mutex for ALL indices in the hole or truncated
> > range.  In this way, it KNOWS fault code has finished with the page/index
> > OR fault code will see the updated file size.
> > 
> > Signed-off-by: Mike Kravetz <mike.kravetz@...cle.com>
> > ---
> 
> <snip>
> 
> > @@ -5606,8 +5610,10 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
> >  
> >  	ptl = huge_pte_lock(h, mm, ptep);
> >  	size = i_size_read(mapping->host) >> huge_page_shift(h);
> > -	if (idx >= size)
> > +	if (idx >= size) {
> > +		beyond_i_size = true;
> 
> Thanks for your patch. There is one question:
> 
> Since races between hugetlb pagefault and truncate is guarded by hugetlb_fault_mutex,
> do we really need to check it again after taking the page table lock?
> 

Well, the fault mutex can only guard a single hugetlb page.  The fault mutex
is actually an array/table of mutexes hashed by mapping address and file index.
So, during truncation we take take the mutex for each page as they are
unmapped and removed.  So, the fault mutex only synchronizes operations
on one specific page.  The idea with this patch is to coordinate the fault
code and truncate code when operating on the same page.

In addition, changing the file size happens early in the truncate process
before taking any locks/mutexes.
-- 
Mike Kravetz

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ