linux-kernel - [PATCH v4 8/9] mm/shmem, swap: simplify swap entry and index calculation of large swapin

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250704181748.63181-9-ryncsn@gmail.com>
Date: Sat,  5 Jul 2025 02:17:47 +0800
From: Kairui Song <ryncsn@...il.com>
To: linux-mm@...ck.org
Cc: Andrew Morton <akpm@...ux-foundation.org>,
	Hugh Dickins <hughd@...gle.com>,
	Baolin Wang <baolin.wang@...ux.alibaba.com>,
	Matthew Wilcox <willy@...radead.org>,
	Kemeng Shi <shikemeng@...weicloud.com>,
	Chris Li <chrisl@...nel.org>,
	Nhat Pham <nphamcs@...il.com>,
	Baoquan He <bhe@...hat.com>,
	Barry Song <baohua@...nel.org>,
	linux-kernel@...r.kernel.org,
	Kairui Song <kasong@...cent.com>
Subject: [PATCH v4 8/9] mm/shmem, swap: simplify swap entry and index calculation of large swapin

From: Kairui Song <kasong@...cent.com>

Noticing large shmem swapin have already calculated the right swap value
to be used before the swap cache look up, simply rounding it down against
the size of the folio brought in the swapin is simple and effective
enough to get the swap value to be verified. A folio's swap entry is
always aligned by its size.

Any kind of parallel split or race is fine, because the final
shmem_add_to_page_cache always ensures entries covered by the folio
are all correct, and there won't be any data corruption.

This shouldn't cause any increased repeated fault too, instead, no
matter how the shmem mapping is split in parallel, as long as the
mapping still contains the right entries, the swapin will succeed.

This reduced both the final object size and stack usage:

./scripts/bloat-o-meter mm/shmem.o.old mm/shmem.o
add/remove: 0/0 grow/shrink: 1/1 up/down: 5/-214 (-209)
Function                                     old     new   delta
shmem_read_mapping_page_gfp                  143     148      +5
shmem_swapin_folio                          4020    3806    -214
Total: Before=33478, After=33269, chg -0.62%

Stack usage (Before vs After):
shmem.c:2279:12:shmem_swapin_folio     280    static
shmem.c:2279:12:shmem_swapin_folio     264    static

Signed-off-by: Kairui Song <kasong@...cent.com>
---
 mm/shmem.c | 43 +++++++++++++++++++++----------------------
 1 file changed, 21 insertions(+), 22 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 782162c0c4e0..646b1db9501c 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2267,7 +2267,7 @@ static int shmem_split_large_entry(struct inode *inode, pgoff_t index,
 	if (xas_error(&xas))
 		return xas_error(&xas);
 
-	return entry_order;
+	return 0;
 }
 
 /*
@@ -2288,7 +2288,7 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
 	struct swap_info_struct *si;
 	struct folio *folio = NULL;
 	bool skip_swapcache = false;
-	int error, nr_pages, order, split_order;
+	int error, nr_pages, order;
 	pgoff_t offset;
 
 	VM_BUG_ON(!*foliop || !xa_is_value(*foliop));
@@ -2336,8 +2336,6 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
 				folio = NULL;
 				goto failed;
 			} else {
-				if (folio_test_large(folio))
-					swap = index_entry;
 				skip_swapcache = true;
 			}
 		} else {
@@ -2349,6 +2347,7 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
 			}
 		}
 	}
+
 	if (order > folio_order(folio)) {
 		/*
 		 * Swapin may get smaller folios due to various reasons:
@@ -2358,23 +2357,25 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
 		 * large swap entries. In such cases, we should split the
 		 * large swap entry to prevent possible data corruption.
 		 */
-		split_order = shmem_split_large_entry(inode, index, index_entry, gfp);
-		if (split_order < 0) {
-			error = split_order;
+		error = shmem_split_large_entry(inode, index, index_entry, gfp);
+		if (error)
 			goto failed_nolock;
-		}
+	}
 
-		/*
-		 * If the large swap entry has already been split, it is
-		 * necessary to recalculate the new swap entry based on
-		 * the old order alignment.
-		 */
-		if (split_order > 0) {
-			offset = index - round_down(index, 1 << split_order);
-			swap = swp_entry(swp_type(swap), swp_offset(swap) + offset);
-		}
-	} else if (order < folio_order(folio)) {
-		swap.val = round_down(swap.val, 1 << folio_order(folio));
+	/*
+	 * If the folio is large, round down swap and index by folio size.
+	 * No matter what race occurs, the swap layer ensures we either get
+	 * a valid folio that has its swap entry aligned by size, or a
+	 * temporarily invalid one which we'll abort very soon and retry.
+	 *
+	 * shmem_add_to_page_cache ensures the whole range contains expected
+	 * entries and prevents any corruption, so any race split is fine
+	 * too, it will succeed as long as the entries are still there.
+	 */
+	nr_pages = folio_nr_pages(folio);
+	if (nr_pages > 1) {
+		swap.val = round_down(swap.val, nr_pages);
+		index = round_down(index, nr_pages);
 	}
 
 	/* We have to do this with folio locked to prevent races */
@@ -2389,7 +2390,6 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
 		goto failed;
 	}
 	folio_wait_writeback(folio);
-	nr_pages = folio_nr_pages(folio);
 
 	/*
 	 * Some architectures may have to restore extra metadata to the
@@ -2403,8 +2403,7 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
 			goto failed;
 	}
 
-	error = shmem_add_to_page_cache(folio, mapping,
-					round_down(index, nr_pages),
+	error = shmem_add_to_page_cache(folio, mapping, index,
 					swp_to_radix_entry(swap), gfp);
 	if (error)
 		goto failed;
-- 
2.50.0