lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 21 Nov 2023 17:06:24 +0800
From:   Liu Shixin <liushixin2@...wei.com>
To:     Yu Zhao <yuzhao@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Yosry Ahmed <yosryahmed@...gle.com>,
        Huang Ying <ying.huang@...el.com>,
        Sachin Sant <sachinp@...ux.ibm.com>,
        Michal Hocko <mhocko@...e.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Kefeng Wang <wangkefeng.wang@...wei.com>
CC:     <linux-mm@...ck.org>, <linux-kernel@...r.kernel.org>,
        Liu Shixin <liushixin2@...wei.com>
Subject: [PATCH v10] mm: vmscan: try to reclaim swapcache pages if no swap space

When spaces of swap devices are exhausted, only file pages can be
reclaimed.  But there are still some swapcache pages in anon lru list.
This can lead to a premature out-of-memory.

The problem is found with such step:

 Firstly, set a 9MB disk swap space, then create a cgroup with 10MB
 memory limit, then runs an program to allocates about 15MB memory.

The problem occurs occasionally, which may need about 100 times [1].

Fix it by checking number of swapcache pages in can_reclaim_anon_pages().
If the number is not zero, return true and set swapcache_only to 1.
When scan anon lru list in swapcache_only mode, non-swapcache pages will
be skipped to isolate in order to accelerate reclaim efficiency.

However, in swapcache_only mode, the scan count still increased when scan
non-swapcache pages because there are large number of non-swapcache pages
and rare swapcache pages in swapcache_only mode, and if the non-swapcache
is skipped and do not count, the scan of pages in isolate_lru_folios() can
eventually lead to hung task, just as Sachin reported [2].

By the way, since there are enough times of memory reclaim before OOM, it
is not need to isolate too much swapcache pages in one times.

[1]. https://lore.kernel.org/lkml/CAJD7tkZAfgncV+KbKr36=eDzMnT=9dZOT0dpMWcurHLr6Do+GA@mail.gmail.com/
[2]. https://lore.kernel.org/linux-mm/CAJD7tkafz_2XAuqE8tGLPEcpLngewhUo=5US14PAtSM9tLBUQg@mail.gmail.com/

Signed-off-by: Liu Shixin <liushixin2@...wei.com>
Tested-by: Yosry Ahmed <yosryahmed@...gle.com>
Reviewed-by: "Huang, Ying" <ying.huang@...el.com>
Reviewed-by: Yosry Ahmed <yosryahmed@...gle.com>
---
v9->v10: Use per-node swapcache suggested by Yu Zhao.
v8->v9: Move the swapcache check after can_demote() and refector 
	can_reclaim_anon_pages() a bit.
v7->v8: Reset swapcache_only at the beginning of can_reclaim_anon_pages().
v6->v7: Reset swapcache_only to zero after there are swap spaces.
v5->v6: Fix NULL pointing derefence and hung task problem reported by Sachin.

 mm/vmscan.c | 50 +++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 49 insertions(+), 1 deletion(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 506f8220c5fe..1fcc94717370 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -136,6 +136,9 @@ struct scan_control {
 	/* Always discard instead of demoting to lower tier memory */
 	unsigned int no_demotion:1;
 
+	/* Swap space is exhausted, only reclaim swapcache for anon LRU */
+	unsigned int swapcache_only:1;
+
 	/* Allocation order */
 	s8 order;
 
@@ -308,10 +311,36 @@ static bool can_demote(int nid, struct scan_control *sc)
 	return true;
 }
 
+#ifdef CONFIG_SWAP
+static bool can_reclaim_swapcache(struct mem_cgroup *memcg, int nid)
+{
+	struct pglist_data *pgdat = NODE_DATA(nid);
+	unsigned long nr_swapcache;
+
+	if (!memcg) {
+		nr_swapcache = node_page_state(pgdat, NR_SWAPCACHE);
+	} else {
+		struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat);
+
+		nr_swapcache = lruvec_page_state_local(lruvec, NR_SWAPCACHE);
+	}
+
+	return nr_swapcache > 0;
+}
+#else
+static bool can_reclaim_swapcache(struct mem_cgroup *memcg, int nid)
+{
+	return false;
+}
+#endif
+
 static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg,
 					  int nid,
 					  struct scan_control *sc)
 {
+	if (sc)
+		sc->swapcache_only = 0;
+
 	if (memcg == NULL) {
 		/*
 		 * For non-memcg reclaim, is there
@@ -330,7 +359,17 @@ static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg,
 	 *
 	 * Can it be reclaimed from this node via demotion?
 	 */
-	return can_demote(nid, sc);
+	if (can_demote(nid, sc))
+		return true;
+
+	/* Is there any swapcache pages to reclaim in this node? */
+	if (can_reclaim_swapcache(memcg, nid)) {
+		if (sc)
+			sc->swapcache_only = 1;
+		return true;
+	}
+
+	return false;
 }
 
 /*
@@ -1642,6 +1681,15 @@ static unsigned long isolate_lru_folios(unsigned long nr_to_scan,
 		 */
 		scan += nr_pages;
 
+		/*
+		 * Count non-swapcache too because the swapcache pages may
+		 * be rare and it takes too much times here if not count
+		 * the non-swapcache pages.
+		 */
+		if (unlikely(sc->swapcache_only && !is_file_lru(lru) &&
+		    !folio_test_swapcache(folio)))
+			goto move;
+
 		if (!folio_test_lru(folio))
 			goto move;
 		if (!sc->may_unmap && folio_mapped(folio))
-- 
2.25.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ