lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 25 Jul 2016 16:51:59 +0900
From:	Minchan Kim <minchan@...nel.org>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	Mel Gorman <mgorman@...e.de>, Johannes Weiner <hannes@...xchg.org>,
	linux-mm@...ck.org, linux-kernel@...r.kernel.org,
	Minchan Kim <minchan@...nel.org>
Subject: [RFC] mm: bail out in shrin_inactive_list

With node-lru, if there are enough reclaimable pages in highmem
but nothing in lowmem, VM can try to shrink inactive list although
the requested zone is lowmem.

The problem is direct reclaimer scans inactive list is fulled with
highmem pages to find a victim page at a reqested zone or lower zones
but the result is that VM should skip all of pages. It just burns out
CPU. Even, many direct reclaimers are stalled by too_many_isolated
if lots of parallel reclaimer are going on although there are no
reclaimable memory in inactive list.

I tried the experiment 4 times in 32bit 2G 8 CPU KVM machine
to get elapsed time.

	hackbench 500 process 2

= Old =

1st: 289s 2nd: 310s 3rd: 112s 4th: 272s

= Now =

1st: 31s  2nd: 132s 3rd: 162s 4th: 50s.

Signed-off-by: Minchan Kim <minchan@...nel.org>
---
I believe proper fix is to modify get_scan_count. IOW, I think
we should introduce lruvec_reclaimable_lru_size with proper
classzone_idx but I don't know how we can fix it with memcg
which doesn't have zone stat now. should introduce zone stat
back to memcg? Or, it's okay to ignore memcg?

 mm/vmscan.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index e5af357..3d285cc 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1652,6 +1652,31 @@ static int current_may_throttle(void)
 		bdi_write_congested(current->backing_dev_info);
 }
 
+static inline bool inactive_reclaimable_pages(struct lruvec *lruvec,
+				struct scan_control *sc,
+				enum lru_list lru)
+{
+	int zid;
+	struct zone *zone;
+	bool file = is_file_lru(lru);
+	struct pglist_data *pgdat = lruvec_pgdat(lruvec);
+
+	if (!global_reclaim(sc))
+		return true;
+
+	for (zid = sc->reclaim_idx; zid >= 0; zid--) {
+		zone = &pgdat->node_zones[zid];
+		if (!populated_zone(zone))
+			continue;
+
+		if (zone_page_state_snapshot(zone, NR_ZONE_LRU_BASE +
+				LRU_FILE * file) >= SWAP_CLUSTER_MAX)
+			return true;
+	}
+
+	return false;
+}
+
 /*
  * shrink_inactive_list() is a helper for shrink_node().  It returns the number
  * of reclaimed pages
@@ -1674,6 +1699,9 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 	struct pglist_data *pgdat = lruvec_pgdat(lruvec);
 	struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat;
 
+	if (!inactive_reclaimable_pages(lruvec, sc, lru))
+		return 0;
+
 	while (unlikely(too_many_isolated(pgdat, file, sc))) {
 		congestion_wait(BLK_RW_ASYNC, HZ/10);
 
-- 
1.9.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ