linux-kernel - [PATCH WIP v1 14/20] mm/huge_memory: avoid folio_refcount() < folio_mapcount() in __split_huge_pmd

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20231124132626.235350-15-david@redhat.com>
Date:   Fri, 24 Nov 2023 14:26:19 +0100
From:   David Hildenbrand <david@...hat.com>
To:     linux-kernel@...r.kernel.org
Cc:     linux-mm@...ck.org, David Hildenbrand <david@...hat.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Ryan Roberts <ryan.roberts@....com>,
        Matthew Wilcox <willy@...radead.org>,
        Hugh Dickins <hughd@...gle.com>,
        Yin Fengwei <fengwei.yin@...el.com>,
        Yang Shi <shy828301@...il.com>,
        Ying Huang <ying.huang@...el.com>, Zi Yan <ziy@...dia.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>, Will Deacon <will@...nel.org>,
        Waiman Long <longman@...hat.com>,
        "Paul E. McKenney" <paulmck@...nel.org>
Subject: [PATCH WIP v1 14/20] mm/huge_memory: avoid folio_refcount() < folio_mapcount() in __split_huge_pmd_locked()

Currently, there is a short period in time where the refcount is smaller
than the mapcount. Let's just make sure we obey the rules of refcount
vs. mapcount: increment the refcount before incrementing the mapcount
and decrement the refcount after decrementing the mapcount.

While this could make code like can_split_folio() fail to detect other
folio references, such code is (currently) racy already and this change
shouldn't actually be considered a real fix but rather an improvement/
cleanup.

The refcount vs. mapcount changes are now well balanced in the code, with
the cost of one additional refcount change, which really shouldn't matter
here that much -- we're usually touching >= 512 subpage mapcounts and
much more after all.

Found while playing with some sanity checks to detect such cases, which
we might add at some later point.

Signed-off-by: David Hildenbrand <david@...hat.com>
---
 mm/huge_memory.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index f47971d1afbf..9639b4edc8a5 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2230,7 +2230,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 		if (!freeze) {
 			rmap_t rmap_flags = RMAP_NONE;

-			folio_ref_add(folio, HPAGE_PMD_NR - 1);
+			folio_ref_add(folio, HPAGE_PMD_NR);
 			if (anon_exclusive)
 				rmap_flags = RMAP_EXCLUSIVE;
 			folio_add_anon_rmap_range(folio, page, HPAGE_PMD_NR,
@@ -2294,10 +2294,10 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
 	}
 	pte_unmap(pte - 1);

-	if (!pmd_migration)
+	if (!pmd_migration) {
 		page_remove_rmap(page, vma, true);
-	if (freeze)
 		put_page(page);
+	}

 	smp_wmb(); /* make pte visible before pmd */
 	pmd_populate(mm, pmd, pgtable);
-- 
2.41.0