lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20241216165105.56185-11-dev.jain@arm.com>
Date: Mon, 16 Dec 2024 22:21:03 +0530
From: Dev Jain <dev.jain@....com>
To: akpm@...ux-foundation.org,
	david@...hat.com,
	willy@...radead.org,
	kirill.shutemov@...ux.intel.com
Cc: ryan.roberts@....com,
	anshuman.khandual@....com,
	catalin.marinas@....com,
	cl@...two.org,
	vbabka@...e.cz,
	mhocko@...e.com,
	apopple@...dia.com,
	dave.hansen@...ux.intel.com,
	will@...nel.org,
	baohua@...nel.org,
	jack@...e.cz,
	srivatsa@...il.mit.edu,
	haowenchao22@...il.com,
	hughd@...gle.com,
	aneesh.kumar@...nel.org,
	yang@...amperecomputing.com,
	peterx@...hat.com,
	ioworker0@...il.com,
	wangkefeng.wang@...wei.com,
	ziy@...dia.com,
	jglisse@...gle.com,
	surenb@...gle.com,
	vishal.moola@...il.com,
	zokeefe@...gle.com,
	zhengqi.arch@...edance.com,
	jhubbard@...dia.com,
	21cnbao@...il.com,
	linux-mm@...ck.org,
	linux-kernel@...r.kernel.org,
	Dev Jain <dev.jain@....com>
Subject: [RFC PATCH 10/12] khugepaged: Skip PTE range if a larger mTHP is already mapped

We may hit a situation wherein we have a larger folio mapped. It is incorrect
to go ahead with the collapse since some pages will be unmapped, leading to
the entire folio getting unmapped. Therefore, skip the corresponding range.

Signed-off-by: Dev Jain <dev.jain@....com>
---
In the future, if at all it is required that at some point we want all the folios
in the system to be of a specific order, we may split these larger folios.

 mm/khugepaged.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 8040b130e677..47e7c476b893 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -33,6 +33,7 @@ enum scan_result {
 	SCAN_PMD_NULL,
 	SCAN_PMD_NONE,
 	SCAN_PMD_MAPPED,
+	SCAN_PTE_MAPPED,
 	SCAN_EXCEED_NONE_PTE,
 	SCAN_EXCEED_SWAP_PTE,
 	SCAN_EXCEED_SHARED_PTE,
@@ -609,6 +610,11 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
 		folio = page_folio(page);
 		VM_BUG_ON_FOLIO(!folio_test_anon(folio), folio);
 
+		if (order !=HPAGE_PMD_ORDER && folio_order(folio) >= order) {
+			result = SCAN_PTE_MAPPED;
+			goto out;
+		}
+
 		/* See hpage_collapse_scan_ptes(). */
 		if (folio_likely_mapped_shared(folio)) {
 			++shared;
@@ -1369,6 +1375,7 @@ static int hpage_collapse_scan_ptes(struct mm_struct *mm,
 	unsigned long orders;
 	pte_t *pte, *_pte;
 	spinlock_t *ptl;
+	int found_order;
 	pmd_t *pmd;
 	int order;
 
@@ -1467,6 +1474,24 @@ static int hpage_collapse_scan_ptes(struct mm_struct *mm,
 			goto out_unmap;
 		}
 
+		found_order = folio_order(folio);
+
+		/*
+		 * No point of scanning. Two options: if this folio was hit
+		 * somewhere in the middle of the scan, then drop down the
+		 * order. Or, completely skip till the end of this folio. The
+		 * latter gives us a higher order to start with, with atmost
+		 * 1 << order PTEs not collapsed; the former may force us
+		 * to end up going below order 2 and exiting.
+		 */
+		if (order != HPAGE_PMD_ORDER && found_order >= order) {
+			result = SCAN_PTE_MAPPED;
+			_address += (PAGE_SIZE << found_order);
+			_pte += (1UL << found_order);
+			pte_unmap_unlock(pte, ptl);
+			goto decide_order;
+		}
+
 		/*
 		 * We treat a single page as shared if any part of the THP
 		 * is shared. "False negatives" from
@@ -1550,6 +1575,10 @@ static int hpage_collapse_scan_ptes(struct mm_struct *mm,
 		if (_address == org_address + (PAGE_SIZE << HPAGE_PMD_ORDER))
 			goto out;
 	}
+	/* A larger folio was mapped; it will be skipped in next iteration */
+	if (result == SCAN_PTE_MAPPED)
+		goto decide_order;
+
 	if (result != SCAN_SUCCEED) {
 
 		/* Go to the next order. */
@@ -1558,6 +1587,8 @@ static int hpage_collapse_scan_ptes(struct mm_struct *mm,
 			goto out;
 		goto maybe_mmap_lock;
 	} else {
+
+decide_order:
 		address = _address;
 		pte = _pte;
 
-- 
2.30.2


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ