lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20230519111359.40475-1-hannes@cmpxchg.org>
Date:   Fri, 19 May 2023 13:13:59 +0200
From:   Johannes Weiner <hannes@...xchg.org>
To:     Andrew Morton <akpm@...ux-foundation.org>
Cc:     Mel Gorman <mgorman@...hsingularity.net>,
        Vlastimil Babka <vbabka@...e.cz>,
        Michal Hocko <mhocko@...e.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, kernel-team@...com
Subject: [PATCH] mm: compaction: avoid GFP_NOFS ABBA deadlock

During stress testing with higher-order allocations, a deadlock
scenario was observed in compaction: One GFP_NOFS allocation was
sleeping on mm/compaction.c::too_many_isolated(), while all CPUs in
the system were busy with compactors spinning on buffer locks held by
the sleeping GFP_NOFS allocation.

Reclaim is susceptible to this same deadlock; we fixed it by granting
GFP_NOFS allocations additional LRU isolation headroom, to ensure it
makes forward progress while holding fs locks that other reclaimers
might acquire. Do the same here.

This code has been like this since compaction was initially merged,
and I only managed to trigger this with out-of-tree patches that
dramatically increase the contexts that do GFP_NOFS compaction. While
the issue is real, it seems theoretical in nature given existing
allocation sites. Worth fixing now, but no Fixes tag or stable CC.

Signed-off-by: Johannes Weiner <hannes@...xchg.org>
---
 mm/compaction.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

v2:
- clarify too_many_isolated() comment (Mel)
- split isolation deadlock from no-contiguous-anon lockups as that's
  a different scenario and deserves its own patch

diff --git a/mm/compaction.c b/mm/compaction.c
index c8bcdea15f5f..c9a4b6dffcf2 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -745,8 +745,9 @@ isolate_freepages_range(struct compact_control *cc,
 }
 
 /* Similar to reclaim, but different enough that they don't share logic */
-static bool too_many_isolated(pg_data_t *pgdat)
+static bool too_many_isolated(struct compact_control *cc)
 {
+	pg_data_t *pgdat = cc->zone->zone_pgdat;
 	bool too_many;
 
 	unsigned long active, inactive, isolated;
@@ -758,6 +759,17 @@ static bool too_many_isolated(pg_data_t *pgdat)
 	isolated = node_page_state(pgdat, NR_ISOLATED_FILE) +
 			node_page_state(pgdat, NR_ISOLATED_ANON);
 
+	/*
+	 * Allow GFP_NOFS to isolate past the limit set for regular
+	 * compaction runs. This prevents an ABBA deadlock when other
+	 * compactors have already isolated to the limit, but are
+	 * blocked on filesystem locks held by the GFP_NOFS thread.
+	 */
+	if (cc->gfp_mask & __GFP_FS) {
+		inactive >>= 3;
+		active >>= 3;
+	}
+
 	too_many = isolated > (inactive + active) / 2;
 	if (!too_many)
 		wake_throttle_isolated(pgdat);
@@ -806,7 +818,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 	 * list by either parallel reclaimers or compaction. If there are,
 	 * delay for some time until fewer pages are isolated
 	 */
-	while (unlikely(too_many_isolated(pgdat))) {
+	while (unlikely(too_many_isolated(cc))) {
 		/* stop isolation if there are still pages not migrated */
 		if (cc->nr_migratepages)
 			return -EAGAIN;
-- 
2.40.0

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ