linux-kernel - Re: [PATCH 1/1] allocate structures for reservation tracking in hugetlbfs outside of spinlocks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20080807143824.8e0803da.akpm@linux-foundation.org>
Date:	Thu, 7 Aug 2008 14:38:24 -0700
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Andy Whitcroft <apw@...dowen.org>
Cc:	gerald.schaefer@...ibm.com, linux-kernel@...r.kernel.org,
	linux-mm@...ck.org, schwidefsky@...ibm.com,
	heiko.carstens@...ibm.com, mel@....ul.ie, apw@...dowen.org
Subject: Re: [PATCH 1/1] allocate structures for reservation tracking in
 hugetlbfs outside of spinlocks

On Thu,  7 Aug 2008 21:28:23 +0100
Andy Whitcroft <apw@...dowen.org> wrote:

> [Andrew, this fixes a problem in the private reservations stack, shown up
> by some testing done by Gerald on s390 with PREEMPT.  It fixes an attempt
> at allocation while holding locks.  This should be merged up to mainline
> as a bug fix to those patches.]
> 
> In the normal case, hugetlbfs reserves hugepages at map time so that the
> pages exist for future faults.  A struct file_region is used to track
> when reservations have been consumed and where.  These file_regions
> are allocated as necessary with kmalloc() which can sleep with the
> mm->page_table_lock held.  This is wrong and triggers may-sleep warning
> when PREEMPT is enabled.
> 
> Updates to the underlying file_region are done in two phases.  The first
> phase prepares the region for the change, allocating any necessary memory,
> without actually making the change.  The second phase actually commits
> the change.  This patch makes use of this by checking the reservations
> before the page_table_lock is taken; triggering any necessary allocations.
> This may then be safely repeated within the locks without any allocations
> being required.
> 
> Credit to Mel Gorman for diagnosing this failure and initial versions of
> the patch.
> 

After applying the patch:

: int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
: 			unsigned long address, int write_access)
: {
: 	pte_t *ptep;
: 	pte_t entry;
: 	int ret;
: 	struct page *pagecache_page = NULL;
: 	static DEFINE_MUTEX(hugetlb_instantiation_mutex);
: 	struct hstate *h = hstate_vma(vma);
: 
: 	ptep = huge_pte_alloc(mm, address, huge_page_size(h));
: 	if (!ptep)
: 		return VM_FAULT_OOM;
: 
: 	/*
: 	 * Serialize hugepage allocation and instantiation, so that we don't
: 	 * get spurious allocation failures if two CPUs race to instantiate
: 	 * the same page in the page cache.
: 	 */
: 	mutex_lock(&hugetlb_instantiation_mutex);
: 	entry = huge_ptep_get(ptep);
: 	if (huge_pte_none(entry)) {
: 		ret = hugetlb_no_page(mm, vma, address, ptep, write_access);
: 		mutex_unlock(&hugetlb_instantiation_mutex);
: 		return ret;
: 	}
: 
: 	ret = 0;
: 
: 	/*
: 	 * If we are going to COW the mapping later, we examine the pending
: 	 * reservations for this page now. This will ensure that any
: 	 * allocations necessary to record that reservation occur outside the
: 	 * spinlock. For private mappings, we also lookup the pagecache
: 	 * page now as it is used to determine if a reservation has been
: 	 * consumed.
: 	 */
: 	if (write_access && !pte_write(entry)) {
: 		vma_needs_reservation(h, vma, address);
: 
: 		if (!(vma->vm_flags & VM_SHARED))
: 			pagecache_page = hugetlbfs_pagecache_page(h,
: 								vma, address);
: 	}

There's a seeming race window here, where a new page can get
instantiated.  But down-read(mmap_sem) plus hugetlb_instantiation_mutex
prevents that, yes?


: 	spin_lock(&mm->page_table_lock);
: 	/* Check for a racing update before calling hugetlb_cow */
: 	if (likely(pte_same(entry, huge_ptep_get(ptep))))
: 		if (write_access && !pte_write(entry))
: 			ret = hugetlb_cow(mm, vma, address, ptep, entry,
: 							pagecache_page);
: 	spin_unlock(&mm->page_table_lock);
: 
: 	if (pagecache_page) {
: 		unlock_page(pagecache_page);
: 		put_page(pagecache_page);
: 	}
: 
: 	mutex_unlock(&hugetlb_instantiation_mutex);
: 
: 	return ret;
: }
: 
: 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/