[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1371557387-22434-9-git-send-email-mhocko@suse.cz>
Date: Tue, 18 Jun 2013 14:09:47 +0200
From: Michal Hocko <mhocko@...e.cz>
To: Andrew Morton <akpm@...ux-foundation.org>,
Johannes Weiner <hannes@...xchg.org>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
Cc: linux-mm@...ck.org, cgroups@...r.kernel.org,
linux-kernel@...r.kernel.org, Ying Han <yinghan@...gle.com>,
Hugh Dickins <hughd@...gle.com>,
Michel Lespinasse <walken@...gle.com>,
Greg Thelen <gthelen@...gle.com>,
KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
Tejun Heo <tj@...nel.org>,
Balbir Singh <bsingharora@...il.com>,
Glauber Costa <glommer@...il.com>
Subject: [PATCH v5 8/8] memcg, vmscan: do not fall into reclaim-all pass too quickly
shrink_zone starts with soft reclaim pass first and then falls back to
regular reclaim if nothing has been scanned. This behavior is natural
but there is a catch. Memcg iterators, when used with the reclaim
cookie, are designed to help to prevent from over reclaim by
interleaving reclaimers (per node-zone-priority) so the tree walk might
miss many (even all) nodes in the hierarchy e.g. when there are direct
reclaimers racing with each other or with kswapd in the global case or
multiple allocators reaching the limit for the target reclaim case.
To make it even more complicated, targeted reclaim doesn't do the whole
tree walk because it stops reclaiming once it reclaims sufficient pages.
As a result groups over the limit might be missed, thus nothing is
scanned, and reclaim would fall back to the reclaim all mode.
This patch checks for the incomplete tree walk in shrink_zone. If no
group has been visited and the hierarchy is soft reclaimable then we
must have missed some groups, in which case the __shrink_zone is called
again. This doesn't guarantee there will be some progress of course
because the current reclaimer might be still racing with others but it
would at least give a chance to start the walk without a big risk of
reclaim latencies.
Signed-off-by: Michal Hocko <mhocko@...e.cz>
---
mm/vmscan.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 56302da..8cbc8e5 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2136,10 +2136,11 @@ static inline bool should_continue_reclaim(struct zone *zone,
}
}
-static void
+static int
__shrink_zone(struct zone *zone, struct scan_control *sc, bool soft_reclaim)
{
unsigned long nr_reclaimed, nr_scanned;
+ int groups_scanned = 0;
do {
struct mem_cgroup *root = sc->target_mem_cgroup;
@@ -2157,6 +2158,7 @@ __shrink_zone(struct zone *zone, struct scan_control *sc, bool soft_reclaim)
while ((memcg = mem_cgroup_iter_cond(root, memcg, &reclaim, filter))) {
struct lruvec *lruvec;
+ groups_scanned++;
lruvec = mem_cgroup_zone_lruvec(zone, memcg);
shrink_lruvec(lruvec, sc);
@@ -2184,6 +2186,8 @@ __shrink_zone(struct zone *zone, struct scan_control *sc, bool soft_reclaim)
} while (should_continue_reclaim(zone, sc->nr_reclaimed - nr_reclaimed,
sc->nr_scanned - nr_scanned, sc));
+
+ return groups_scanned;
}
@@ -2191,8 +2195,19 @@ static void shrink_zone(struct zone *zone, struct scan_control *sc)
{
bool do_soft_reclaim = mem_cgroup_should_soft_reclaim(sc);
unsigned long nr_scanned = sc->nr_scanned;
+ int scanned_groups;
- __shrink_zone(zone, sc, do_soft_reclaim);
+ scanned_groups = __shrink_zone(zone, sc, do_soft_reclaim);
+ /*
+ * memcg iterator might race with other reclaimer or start from
+ * a incomplete tree walk so the tree walk in __shrink_zone
+ * might have missed groups that are above the soft limit. Try
+ * another loop to catch up with others. Do it just once to
+ * prevent from reclaim latencies when other reclaimers always
+ * preempt this one.
+ */
+ if (do_soft_reclaim && !scanned_groups)
+ __shrink_zone(zone, sc, do_soft_reclaim);
/*
* No group is over the soft limit or those that are do not have
--
1.8.3.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists