lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20230111133351.807024-1-jannh@google.com>
Date:   Wed, 11 Jan 2023 14:33:51 +0100
From:   Jann Horn <jannh@...gle.com>
To:     Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org
Cc:     "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Zach O'Keefe <zokeefe@...gle.com>,
        linux-kernel@...r.kernel.org, David Hildenbrand <david@...hat.com>,
        Yang Shi <shy828301@...il.com>
Subject: [PATCH] mm/khugepaged: Fix ->anon_vma race

If an ->anon_vma is attached to the VMA, collapse_and_free_pmd() requires
it to be locked. retract_page_tables() bails out if an ->anon_vma is
attached, but does this check before holding the mmap lock (as the comment
above the check explains).

If we racily merge an existing ->anon_vma (shared with a child process)
from a neighboring VMA, subsequent rmap traversals on pages belonging to
the child will be able to see the page tables that we are concurrently
removing while assuming that nothing else can access them.

Repeat the ->anon_vma check once we hold the mmap lock to ensure that there
really is no concurrent page table access.

Reported-by: Zach O'Keefe <zokeefe@...gle.com>
Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages")
Cc: stable@...r.kernel.org
Signed-off-by: Jann Horn <jannh@...gle.com>
---
zokeefe@ pointed out to me that the current code (after my last round of patches)
can hit a lockdep assert by racing, and after staring at it a bit I've
convinced myself that this is a real, preexisting bug.
(I haven't written a reproducer for it though. One way to hit it might be
something along the lines of:

 - set up a process A with a private-file-mapping VMA V1
 - let A fork() to create process B, thereby copying V1 in A to V1' in B
 - let B extend the end of V1'
 - let B put some anon pages into the extended part of V1'
 - let A map a new private-file-mapping VMA V2 directly behind V1, without
   an anon_vma
[race begins here]
  - in A's thread 1: begin retract_page_tables() on V2, run through first
    ->anon_vma check
  - in A's thread 2: run __anon_vma_prepare() on V2 and ensure that it
    merges the anon_vma of V1 (which implies V1 and V2 must be mapping the
    same file at compatible offsets)
  - in B: trigger rmap traversal on anon page in V1'

 mm/khugepaged.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 5cb401aa2b9d..0bfed37f3a3b 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1644,7 +1644,7 @@ static int retract_page_tables(struct address_space *mapping, pgoff_t pgoff,
 		 * has higher cost too. It would also probably require locking
 		 * the anon_vma.
 		 */
-		if (vma->anon_vma) {
+		if (READ_ONCE(vma->anon_vma)) {
 			result = SCAN_PAGE_ANON;
 			goto next;
 		}
@@ -1672,6 +1672,18 @@ static int retract_page_tables(struct address_space *mapping, pgoff_t pgoff,
 		result = SCAN_PTE_MAPPED_HUGEPAGE;
 		if ((cc->is_khugepaged || is_target) &&
 		    mmap_write_trylock(mm)) {
+			/*
+			 * Re-check whether we have an ->anon_vma, because
+			 * collapse_and_free_pmd() requires that either no
+			 * ->anon_vma exists or the anon_vma is locked.
+			 * We already checked ->anon_vma above, but that check
+			 * is racy because ->anon_vma can be populated under the
+			 * mmap lock in read mode.
+			 */
+			if (vma->anon_vma) {
+				result = SCAN_PAGE_ANON;
+				goto unlock_next;
+			}
 			/*
 			 * When a vma is registered with uffd-wp, we can't
 			 * recycle the pmd pgtable because there can be pte

base-commit: 7dd4b804e08041ff56c88bdd8da742d14b17ed25
-- 
2.39.0.314.g84b9a713c41-goog

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ