linux-kernel - Re: [bug/regression] libhugetlbfs testsuite failures and OOMs eventually kill my system

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <d90321d5-f81a-2752-fea2-fae11556bdba@oracle.com>
Date:   Mon, 17 Oct 2016 18:18:50 -0700
From:   Mike Kravetz <mike.kravetz@...cle.com>
To:     "Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>,
        Jan Stancek <jstancek@...hat.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Cc:     hillf.zj@...baba-inc.com, dave.hansen@...ux.intel.com,
        kirill.shutemov@...ux.intel.com, mhocko@...e.cz,
        n-horiguchi@...jp.nec.com, iamjoonsoo.kim@....com
Subject: Re: [bug/regression] libhugetlbfs testsuite failures and OOMs
 eventually kill my system

On 10/17/2016 03:53 PM, Mike Kravetz wrote:
> On 10/16/2016 10:04 PM, Aneesh Kumar K.V wrote:
>>
>> looking at that commit, I am not sure region_chg output indicate a hole
>> punched. ie, w.r.t private mapping when we mmap, we don't do a
>> region_chg (hugetlb_reserve_page()). So with a fault later when we
>> call vma_needs_reservation, we will find region_chg returning >= 0 right ?
>>
> 
> Let me try to explain.
> 
> When a private mapping is created, hugetlb_reserve_pages to reserve
> huge pages for the mapping.  A reserve map is created and installed
> in the (vma_private) VMA.  No reservation entries are actually created
> for the mapping.  But, hugetlb_acct_memory() is called to reserve
> pages for the mapping in the global pool.  This will adjust (increment)
> the global reserved huge page counter (resv_huge_pages).
> 
> As pages within the private mapping are faulted in, huge_page_alloc() is
> called to allocate the pages.  Within alloc_huge_page, vma_needs_reservation
> is called to determine if there is a reservation for this allocation.
> If there is a reservation, the global count is adjusted (decremented).
> In any case where a page is returned to the caller, vma_commit_reservation
> is called and an entry for the page is created in the reserve map (VMA
> vma_private) of the mapping.
> 
> Once a page is instantiated within the private mapping, an entry exists
> in the reserve map and the reserve count has been adjusted to indicate
> that the reserve has been consumed.  Subsequent faults will not instantiate
> a new page unless the original is somehow removed from the mapping.  The
> only way a user can remove a page from the mapping is via a hole punch or
> truncate operation.  Note that hole punch and truncate for huge pages
> only to apply to hugetlbfs backed mappings and not anonymous mappings.
> 
> hole punch and truncate will unmap huge pages from any private private
> mapping associated with the same offset in the hugetlbfs file.  However,
> they will not remove entries from the VMA private_data reserve maps.
> Nor, will they adjust global reserve counts based on private mappings.

Question.  Should hole punch and truncate unmap private mappings?
Commit 67961f9db8c4 is just trying to correctly handle that situation.
If we do not unmap the private pages, then there is no need for this code.

-- 
Mike Kravetz

> 
> Now suppose a subsequent fault happened for a page private mapping removed
> via hole punch or truncate.  Prior to commit 67961f9db8c4,
> vma_needs_reservation ALWAYS returned false to indicate that a reservation
> existed for the page.  So, alloc_huge_page would consume a reserved page.
> The problem is that the reservation was consumed at the time of the first
> fault and no longer exist.  This caused the global reserve count to be
> incorrect.
> 
> Commit 67961f9db8c4 looks at the VMA private reserve map to determine if
> the original reservation was consumed.  If an entry exists in the map, it
> is assumed the reservation was consumed and no longer exists.
>