lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 25 Oct 2023 15:45:45 +0100
From:   Ryan Roberts <ryan.roberts@....com>
To:     Andrew Morton <akpm@...ux-foundation.org>,
        David Hildenbrand <david@...hat.com>,
        Matthew Wilcox <willy@...radead.org>,
        Huang Ying <ying.huang@...el.com>,
        Gao Xiang <xiang@...nel.org>, Yu Zhao <yuzhao@...gle.com>,
        Yang Shi <shy828301@...il.com>, Michal Hocko <mhocko@...e.com>,
        Kefeng Wang <wangkefeng.wang@...wei.com>
Cc:     Ryan Roberts <ryan.roberts@....com>, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org
Subject: [PATCH v3 3/4] mm: swap: Simplify ssd behavior when scanner steals entry

When a CPU fails to reserve a cluster (due to free list exhaustion), we
revert to the scanner to find a free entry somewhere in the swap file.
This might cause an entry to be stolen from another CPU's reserved
cluster. Upon noticing this, the CPU with the stolen entry would
previously scan forward to the end of the cluster trying to find a free
entry to use. If there were none, it would try to reserve a new pre-cpu
cluster and allocate from that.

This scanning behavior does not scale well to high-order allocations,
which will be introduced in a future patch since would need to scan for
a contiguous area that was naturally aligned. Given stealing is a rare
occurrence, let's remove the scanning behavior from the ssd allocator
and simply drop the cluster and try to allocate a new one. Given the
purpose of the per-cpu cluster is to ensure a given task's pages are
sequential on disk to aid readahead, allocating a new cluster at this
point makes most sense.

Furthermore, si->max will always be greater than or equal to the end of
the last cluster because any partial cluster will never be put on the
free cluster list. Therefore we can simplify this logic too.

These changes make it simpler to generalize
scan_swap_map_try_ssd_cluster() to handle any allocation order.

Signed-off-by: Ryan Roberts <ryan.roberts@....com>
---
 mm/swapfile.c | 21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 617e34b8cdbe..94f7cc225eb9 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -639,27 +639,24 @@ static bool scan_swap_map_try_ssd_cluster(struct swap_info_struct *si,
 
 	/*
 	 * Other CPUs can use our cluster if they can't find a free cluster,
-	 * check if there is still free entry in the cluster
+	 * check if the expected entry is still free. If not, drop it and
+	 * reserve a new cluster.
 	 */
-	max = min_t(unsigned long, si->max,
-		    ALIGN_DOWN(tmp, SWAPFILE_CLUSTER) + SWAPFILE_CLUSTER);
-	if (tmp < max) {
-		ci = lock_cluster(si, tmp);
-		while (tmp < max) {
-			if (!si->swap_map[tmp])
-				break;
-			tmp++;
-		}
+	ci = lock_cluster(si, tmp);
+	if (si->swap_map[tmp]) {
 		unlock_cluster(ci);
-	}
-	if (tmp >= max) {
 		*cpu_next = SWAP_NEXT_NULL;
 		goto new_cluster;
 	}
+	unlock_cluster(ci);
+
 	*offset = tmp;
 	*scan_base = tmp;
+
+	max = ALIGN_DOWN(tmp, SWAPFILE_CLUSTER) + SWAPFILE_CLUSTER;
 	tmp += 1;
 	*cpu_next = tmp < max ? tmp : SWAP_NEXT_NULL;
+
 	return true;
 }
 
-- 
2.25.1

Powered by blists - more mailing lists