linux-kernel - [RFC PATCH 4/5] mm: truncate: split huge page cache page to a non-zero order if possible.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20220321142128.2471199-5-zi.yan@sent.com>
Date:   Mon, 21 Mar 2022 10:21:27 -0400
From:   Zi Yan <zi.yan@...t.com>
To:     Matthew Wilcox <willy@...radead.org>, linux-mm@...ck.org
Cc:     Roman Gushchin <roman.gushchin@...ux.dev>,
        Shuah Khan <shuah@...nel.org>, Yang Shi <shy828301@...il.com>,
        Miaohe Lin <linmiaohe@...wei.com>,
        Hugh Dickins <hughd@...gle.com>,
        "Kirill A . Shutemov" <kirill.shutemov@...ux.intel.com>,
        linux-kernel@...r.kernel.org, cgroups@...r.kernel.org,
        linux-kselftest@...r.kernel.org, Zi Yan <ziy@...dia.com>
Subject: [RFC PATCH 4/5] mm: truncate: split huge page cache page to a non-zero order if possible.

From: Zi Yan <ziy@...dia.com>

To minimize the number of pages after a huge page truncation, we do not
need to split it all the way down to order-0. The huge page has at most
three parts, the part before offset, the part to be truncated, the part
remaining at the end. Find the greatest common power of two multiplier of
the non-zero values of them as the new order, so we can split the huge
page to this order and keep the remaining pages as large and as few as
possible.

Signed-off-by: Zi Yan <ziy@...dia.com>
---
 mm/huge_memory.c |  1 +
 mm/truncate.c    | 33 +++++++++++++++++++++++++++++++--
 2 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 3617aa3ad0b1..76db0092a1e2 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2349,6 +2349,7 @@ static void __split_huge_page_tail(struct page *head, int tail,
 		prep_compound_page(page_tail, new_order);
 		prep_transhuge_page(page_tail);
 	}
+	VM_BUG_ON_PAGE(PageTail(page_tail), page_tail);
 
 	/* Finally unfreeze refcount. Additional reference from page cache. */
 	page_ref_unfreeze(page_tail, 1 + ((!PageAnon(head) ||
diff --git a/mm/truncate.c b/mm/truncate.c
index ab50d0d59a2a..4f71e67dec09 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -197,6 +197,14 @@ int truncate_inode_folio(struct address_space *mapping, struct folio *folio)
 	return 0;
 }
 
+static unsigned int greatest_pow_of_two_multiplier(unsigned int num)
+{
+	if (num & 1)
+		return 0;
+	return min_t(unsigned int, ilog2(num),
+		ilog2(num - rounddown_pow_of_two(num)));
+}
+
 /*
  * Handle partial folios.  The folio may be entirely within the
  * range if a split has raced with us.  If not, we zero the part of the
@@ -211,7 +219,8 @@ int truncate_inode_folio(struct address_space *mapping, struct folio *folio)
 bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
 {
 	loff_t pos = folio_pos(folio);
-	unsigned int offset, length;
+	unsigned int offset, length, remaining;
+	unsigned int new_order = folio_order(folio);
 
 	if (pos < start)
 		offset = start - pos;
@@ -222,6 +231,7 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
 		length = length - offset;
 	else
 		length = end + 1 - pos - offset;
+	remaining = folio_size(folio) - offset - length;
 
 	folio_wait_writeback(folio);
 	if (length == folio_size(folio)) {
@@ -236,11 +246,30 @@ bool truncate_inode_partial_folio(struct folio *folio, loff_t start, loff_t end)
 	 */
 	folio_zero_range(folio, offset, length);
 
+	/*
+	 * Find the greatest common power of two multiplier of the non-zero
+	 * offset, length, and remaining as the new order. So we can truncate
+	 * a subpage as large as possible.
+	 */
+	if (offset)
+		new_order = greatest_pow_of_two_multiplier(offset / PAGE_SIZE);
+	if (length)
+		new_order = min_t(unsigned int, new_order,
+			greatest_pow_of_two_multiplier(length / PAGE_SIZE));
+	if (remaining)
+		new_order = min_t(unsigned int, new_order,
+			greatest_pow_of_two_multiplier(remaining / PAGE_SIZE));
+
+	/* order-1 THP not supported, downgrade to order-0 */
+	if (new_order == 1)
+		new_order = 0;
+
+
 	if (folio_has_private(folio))
 		folio_invalidate(folio, offset, length);
 	if (!folio_test_large(folio))
 		return true;
-	if (split_huge_page(&folio->page) == 0)
+	if (split_huge_page_to_list_to_order(&folio->page, NULL, new_order) == 0)
 		return true;
 	if (folio_test_dirty(folio))
 		return false;
-- 
2.35.1