linux-kernel - [PATCH v4 0/3] alloc_huge_page/hugetlb_reserve

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <1433264353-4050-1-git-send-email-mike.kravetz@oracle.com>
Date:	Tue,  2 Jun 2015 09:59:10 -0700
From:	Mike Kravetz <mike.kravetz@...cle.com>
To:	linux-mm@...ck.org, linux-kernel@...r.kernel.org
Cc:	Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
	Davidlohr Bueso <dave@...olabs.net>,
	David Rientjes <rientjes@...gle.com>,
	Luiz Capitulino <lcapitulino@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Mike Kravetz <mike.kravetz@...cle.com>
Subject: [PATCH v4 0/3] alloc_huge_page/hugetlb_reserve_pages race

v3 of patch did not take hugetlbfs min_size into account as noted
in change log.

While working on hugetlbfs fallocate support, I noticed the following
race in the existing code.  It is unlikely that this race is hit very
often in the current code.  However, if more functionality to add and
remove pages to hugetlbfs mappings (such as fallocate) is added the
likelihood of hitting this race will increase.

alloc_huge_page and hugetlb_reserve_pages use information from the
reserve map to determine if there are enough available huge pages to
complete the operation, as well as adjust global reserve and subpool
usage counts.  The order of operations is as follows:
- call region_chg() to determine the expected change based on reserve map
- determine if enough resources are available for this operation
- adjust global counts based on the expected change
- call region_add() to update the reserve map
The issue is that reserve map could change between the call to region_chg
and region_add.  In this case, the counters which were adjusted based on
the output of region_chg will not be correct.

In order to hit this race today, there must be an existing shared hugetlb
mmap created with the MAP_NORESERVE flag.  A page fault to allocate a huge
page via this mapping must occur at the same another task is mapping the
same region without the MAP_NORESERVE flag.

The patch set does not prevent the race from happening.  Rather, it adds
simple functionality to detect when the race has occurred.  If a race is
detected, then the incorrect counts are adjusted.

Review comments pointed out the need for documentation of the existing
region/reserve map routines.  This patch set also adds documentation
in this area.

v4:
  Reserve count adjustments need to take into account hugetlbfs min_size
  reservation pools
v3:
  Created separate patch for new documentation created in v2
  Added VM_BUG_ON() to region add at suggestion of Naoya Horiguchi
  __vma_reservation_common keys off parameter commit for easier reading
v2:
  Added documentation for the region/reserve map routines
  Created common routine for vma_commit_reservation and
    vma_commit_reservation to help prevent them from drifting
    apart in the future.

Mike Kravetz (3):
  mm/hugetlb: document the reserve map/region tracking routines
  mm/hugetlb: compute/return the number of regions added by region_add()
  mm/hugetlb: handle races in alloc_huge_page and hugetlb_reserve_pages

 mm/hugetlb.c | 163 ++++++++++++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 133 insertions(+), 30 deletions(-)

-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/