lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Thu, 7 Aug 2008 14:38:24 -0700 From: Andrew Morton <akpm@...ux-foundation.org> To: Andy Whitcroft <apw@...dowen.org> Cc: gerald.schaefer@...ibm.com, linux-kernel@...r.kernel.org, linux-mm@...ck.org, schwidefsky@...ibm.com, heiko.carstens@...ibm.com, mel@....ul.ie, apw@...dowen.org Subject: Re: [PATCH 1/1] allocate structures for reservation tracking in hugetlbfs outside of spinlocks On Thu, 7 Aug 2008 21:28:23 +0100 Andy Whitcroft <apw@...dowen.org> wrote: > [Andrew, this fixes a problem in the private reservations stack, shown up > by some testing done by Gerald on s390 with PREEMPT. It fixes an attempt > at allocation while holding locks. This should be merged up to mainline > as a bug fix to those patches.] > > In the normal case, hugetlbfs reserves hugepages at map time so that the > pages exist for future faults. A struct file_region is used to track > when reservations have been consumed and where. These file_regions > are allocated as necessary with kmalloc() which can sleep with the > mm->page_table_lock held. This is wrong and triggers may-sleep warning > when PREEMPT is enabled. > > Updates to the underlying file_region are done in two phases. The first > phase prepares the region for the change, allocating any necessary memory, > without actually making the change. The second phase actually commits > the change. This patch makes use of this by checking the reservations > before the page_table_lock is taken; triggering any necessary allocations. > This may then be safely repeated within the locks without any allocations > being required. > > Credit to Mel Gorman for diagnosing this failure and initial versions of > the patch. > After applying the patch: : int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, : unsigned long address, int write_access) : { : pte_t *ptep; : pte_t entry; : int ret; : struct page *pagecache_page = NULL; : static DEFINE_MUTEX(hugetlb_instantiation_mutex); : struct hstate *h = hstate_vma(vma); : : ptep = huge_pte_alloc(mm, address, huge_page_size(h)); : if (!ptep) : return VM_FAULT_OOM; : : /* : * Serialize hugepage allocation and instantiation, so that we don't : * get spurious allocation failures if two CPUs race to instantiate : * the same page in the page cache. : */ : mutex_lock(&hugetlb_instantiation_mutex); : entry = huge_ptep_get(ptep); : if (huge_pte_none(entry)) { : ret = hugetlb_no_page(mm, vma, address, ptep, write_access); : mutex_unlock(&hugetlb_instantiation_mutex); : return ret; : } : : ret = 0; : : /* : * If we are going to COW the mapping later, we examine the pending : * reservations for this page now. This will ensure that any : * allocations necessary to record that reservation occur outside the : * spinlock. For private mappings, we also lookup the pagecache : * page now as it is used to determine if a reservation has been : * consumed. : */ : if (write_access && !pte_write(entry)) { : vma_needs_reservation(h, vma, address); : : if (!(vma->vm_flags & VM_SHARED)) : pagecache_page = hugetlbfs_pagecache_page(h, : vma, address); : } There's a seeming race window here, where a new page can get instantiated. But down-read(mmap_sem) plus hugetlb_instantiation_mutex prevents that, yes? : spin_lock(&mm->page_table_lock); : /* Check for a racing update before calling hugetlb_cow */ : if (likely(pte_same(entry, huge_ptep_get(ptep)))) : if (write_access && !pte_write(entry)) : ret = hugetlb_cow(mm, vma, address, ptep, entry, : pagecache_page); : spin_unlock(&mm->page_table_lock); : : if (pagecache_page) { : unlock_page(pagecache_page); : put_page(pagecache_page); : } : : mutex_unlock(&hugetlb_instantiation_mutex); : : return ret; : } : : -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists