lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20221118011025.2178986-7-peterx@redhat.com>
Date:   Thu, 17 Nov 2022 20:10:19 -0500
From:   Peter Xu <peterx@...hat.com>
To:     linux-mm@...ck.org, linux-kernel@...r.kernel.org
Cc:     Rik van Riel <riel@...riel.com>,
        Muchun Song <songmuchun@...edance.com>,
        Andrew Morton <akpm@...ux-foundation.org>, peterx@...hat.com,
        James Houghton <jthoughton@...gle.com>,
        Nadav Amit <nadav.amit@...il.com>,
        Andrea Arcangeli <aarcange@...hat.com>,
        David Hildenbrand <david@...hat.com>,
        Miaohe Lin <linmiaohe@...wei.com>,
        Mike Kravetz <mike.kravetz@...cle.com>
Subject: [PATCH RFC v2 06/12] mm/hugetlb: Protect huge_pmd_share() with walker lock

huge_pmd_share() is normally called with vma lock held already, it lets us
feel like we don't really need the walker lock.  But that's not true,
because we only took the vma lock for "current" vma, but not all the vma
pgtables we're going to walk upon.

Taking each vma lock may lead to deadlock and hard to order.  The safe
approach is just to use the walker lock which guarantees the pgtable page
being alive, then we should use get_page_unless_zero() rather than
get_page() only, to make sure the pgtable page is not being freed at the
meantime.

Signed-off-by: Peter Xu <peterx@...hat.com>
---
 mm/hugetlb.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 61a1fa678172..5ef883184885 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -7008,6 +7008,13 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma,
 	spinlock_t *ptl;
 
 	i_mmap_lock_read(mapping);
+
+	/*
+	 * NOTE: even if we've got the vma read lock, here we still need to
+	 * take the walker lock, because we're not walking the current vma,
+	 * but some other mm's!
+	 */
+	hugetlb_walker_lock();
 	vma_interval_tree_foreach(svma, &mapping->i_mmap, idx, idx) {
 		if (svma == vma)
 			continue;
@@ -7016,12 +7023,15 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma,
 		if (saddr) {
 			spte = huge_pte_offset(svma->vm_mm, saddr,
 					       vma_mmu_pagesize(svma));
-			if (spte) {
-				get_page(virt_to_page(spte));
+			/*
+			 * When page ref==0, it means it's probably being
+			 * freed; continue with the next one.
+			 */
+			if (spte && get_page_unless_zero(virt_to_page(spte)))
 				break;
-			}
 		}
 	}
+	hugetlb_walker_unlock();
 
 	if (!spte)
 		goto out;
-- 
2.37.3

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ