linux-kernel - [patch 4/8] memcg: rework soft limit reclaim

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1306909519-7286-5-git-send-email-hannes@cmpxchg.org>
Date:	Wed,  1 Jun 2011 08:25:15 +0200
From:	Johannes Weiner <hannes@...xchg.org>
To:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	Daisuke Nishimura <nishimura@....nes.nec.co.jp>,
	Balbir Singh <balbir@...ux.vnet.ibm.com>,
	Ying Han <yinghan@...gle.com>, Michal Hocko <mhocko@...e.cz>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Rik van Riel <riel@...hat.com>,
	Minchan Kim <minchan.kim@...il.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Mel Gorman <mgorman@...e.de>, Greg Thelen <gthelen@...gle.com>,
	Michel Lespinasse <walken@...gle.com>
Cc:	linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: [patch 4/8] memcg: rework soft limit reclaim

Currently, soft limit reclaim is entered from kswapd, where it selects
the memcg with the biggest soft limit excess in absolute bytes, and
reclaims pages from it with maximum aggressiveness (priority 0).

This has the following disadvantages:

    1. because of the aggressiveness, kswapd can be stalled on a memcg
    that is hard to reclaim from for a long time, sending the rest of
    the allocators into direct reclaim in the meantime.

    2. it only considers the biggest offender (in absolute bytes, no
    less, so very unhandy for setups with different-sized memcgs) and
    does not apply any pressure at all on other memcgs in excess.

    3. because it is only invoked from kswapd, the soft limit is
    meaningful during global memory pressure, but it is not taken into
    account during hierarchical target reclaim where it could allow
    prioritizing memcgs as well.  So while it does hierarchical
    reclaim once triggered, it is not a truly hierarchical mechanism.

Here is a different approach.  Instead of having a soft limit reclaim
cycle separate from the rest of reclaim, this patch ensures that each
time a group of memcgs is reclaimed - be it because of global memory
pressure or because of a hard limit - memcgs that exceed their soft
limit, or contribute to the soft limit excess of one their parents,
are reclaimed from at a higher priority than their siblings.

This results in the following:

    1. all relevant memcgs are scanned with increasing priority during
    memory pressure.  The primary goal is to free pages, not to punish
    soft limit offenders.

    2. increased pressure is applied to all memcgs in excess of their
    soft limit, not only the biggest offender.

    3. the soft limit becomes meaningful for target reclaim as well,
    where it allows prioritizing children of a hierarchy when the
    parent hits its limit.

    4. direct reclaim now also applies increased soft limit pressure,
    not just kswapd anymore.

Signed-off-by: Johannes Weiner <hannes@...xchg.org>
---
 include/linux/memcontrol.h |    7 +++++++
 mm/memcontrol.c            |   26 ++++++++++++++++++++++++++
 mm/vmscan.c                |    8 ++++++--
 3 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 8f402b9..7d99e87 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -104,6 +104,7 @@ extern void mem_cgroup_end_migration(struct mem_cgroup *mem,
 struct mem_cgroup *mem_cgroup_hierarchy_walk(struct mem_cgroup *,
 					     struct mem_cgroup *);
 void mem_cgroup_stop_hierarchy_walk(struct mem_cgroup *, struct mem_cgroup *);
+bool mem_cgroup_soft_limit_exceeded(struct mem_cgroup *, struct mem_cgroup *);
 
 /*
  * For memory reclaim.
@@ -345,6 +346,12 @@ static inline void mem_cgroup_stop_hierarchy_walk(struct mem_cgroup *r,
 {
 }
 
+static inline bool mem_cgroup_soft_limit_exceeded(struct mem_cgroup *root,
+						  struct mem_cgroup *mem)
+{
+	return false;
+}
+
 static inline void
 mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
 {
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 983efe4..94f77cc3 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1460,6 +1460,32 @@ void mem_cgroup_stop_hierarchy_walk(struct mem_cgroup *root,
 		css_put(&mem->css);
 }
 
+/**
+ * mem_cgroup_soft_limit_exceeded - check if a memcg (hierarchically)
+ *                                  exceeds a soft limit
+ * @root: highest ancestor of @mem to consider
+ * @mem: memcg to check for excess
+ *
+ * The function indicates whether @mem has exceeded its own soft
+ * limit, or contributes to the soft limit excess of one of its
+ * parents in the hierarchy below @root.
+ */
+bool mem_cgroup_soft_limit_exceeded(struct mem_cgroup *root,
+				    struct mem_cgroup *mem)
+{
+	for (;;) {
+		if (mem == root_mem_cgroup)
+			return false;
+		if (res_counter_soft_limit_excess(&mem->res))
+			return true;
+		if (mem == root)
+			return false;		
+		mem = parent_mem_cgroup(mem);
+		if (!mem)
+			return false;
+	}
+}
+
 static unsigned long mem_cgroup_reclaim(struct mem_cgroup *mem,
 					gfp_t gfp_mask,
 					unsigned long flags)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c7d4b44..0163840 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1988,9 +1988,13 @@ static void shrink_zone(int priority, struct zone *zone,
 		unsigned long reclaimed = sc->nr_reclaimed;
 		unsigned long scanned = sc->nr_scanned;
 		unsigned long nr_reclaimed;
+		int epriority = priority;
+
+		if (mem_cgroup_soft_limit_exceeded(root, mem))
+			epriority -= 1;
 
 		sc->mem_cgroup = mem;
-		do_shrink_zone(priority, zone, sc);
+		do_shrink_zone(epriority, zone, sc);
 		mem_cgroup_count_reclaim(mem, current_is_kswapd(),
 					 mem != root, /* limit or hierarchy? */
 					 sc->nr_scanned - scanned,
@@ -2480,7 +2484,7 @@ loop_again:
 			 * Call soft limit reclaim before calling shrink_zone.
 			 * For now we ignore the return value
 			 */
-			mem_cgroup_soft_limit_reclaim(zone, order, sc.gfp_mask);
+			//mem_cgroup_soft_limit_reclaim(zone, order, sc.gfp_mask);
 
 			/*
 			 * We put equal pressure on every zone, unless
-- 
1.7.5.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/