lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Thu,  1 Dec 2022 15:33:17 -0800
From:   Mina Almasry <almasrymina@...gle.com>
To:     Huang Ying <ying.huang@...el.com>,
        Yang Shi <yang.shi@...ux.alibaba.com>,
        Yosry Ahmed <yosryahmed@...gle.com>,
        Tim Chen <tim.c.chen@...ux.intel.com>, weixugc@...gle.com,
        shakeelb@...gle.com, gthelen@...gle.com, fvdl@...gle.com,
        Andrew Morton <akpm@...ux-foundation.org>
Cc:     Mina Almasry <almasrymina@...gle.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: [PATCH v1] mm: disable top-tier fallback to reclaim on proactive reclaim

Reclaiming directly from top tier nodes breaks the aging pipeline of
memory tiers.  If we have a RAM -> CXL -> storage hierarchy, we
should demote from RAM to CXL and from CXL to storage. If we reclaim
a page from RAM, it means we 'demote' it directly from RAM to storage,
bypassing potentially a huge amount of pages colder than it in CXL.

However disabling reclaim from top tier nodes entirely would cause ooms
in edge scenarios where lower tier memory is unreclaimable for whatever
reason, e.g. memory being mlocked() or too hot to reclaim.  In these
cases we would rather the job run with a performance regression rather
than it oom altogether.

However, we can disable reclaim from top tier nodes for proactive reclaim.
That reclaim is not real memory pressure, and we don't have any cause to
be breaking the aging pipeline.

Signed-off-by: Mina Almasry <almasrymina@...gle.com>
---
 mm/vmscan.c | 27 ++++++++++++++++++++++++---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 23fc5b523764..6eb130e57920 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2088,10 +2088,31 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
 	nr_reclaimed += demote_folio_list(&demote_folios, pgdat);
 	/* Folios that could not be demoted are still in @demote_folios */
 	if (!list_empty(&demote_folios)) {
-		/* Folios which weren't demoted go back on @folio_list for retry: */
+		/*
+		 * Folios which weren't demoted go back on @folio_list.
+		 */
 		list_splice_init(&demote_folios, folio_list);
-		do_demote_pass = false;
-		goto retry;
+
+		/*
+		 * goto retry to reclaim the undemoted folios in folio_list if
+		 * desired.
+		 *
+		 * Reclaiming directly from top tier nodes is not often desired
+		 * due to it breaking the LRU ordering: in general memory
+		 * should be reclaimed from lower tier nodes and demoted from
+		 * top tier nodes.
+		 *
+		 * However, disabling reclaim from top tier nodes entirely
+		 * would cause ooms in edge scenarios where lower tier memory
+		 * is unreclaimable for whatever reason, eg memory being
+		 * mlocked or too hot to reclaim. We can disable reclaim
+		 * from top tier nodes in proactive reclaim though as that is
+		 * not real memory pressure.
+		 */
+		if (!sc->proactive) {
+			do_demote_pass = false;
+			goto retry;
+		}
 	}

 	pgactivate = stat->nr_activate[0] + stat->nr_activate[1];
--
2.39.0.rc0.267.gcb52ba06e7-goog

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ