[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120113120406.GC17060@tiehlicka.suse.cz>
Date: Fri, 13 Jan 2012 13:04:06 +0100
From: Michal Hocko <mhocko@...e.cz>
To: Johannes Weiner <hannes@...xchg.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
Balbir Singh <bsingharora@...il.com>,
Ying Han <yinghan@...gle.com>, cgroups@...r.kernel.org,
linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [patch 2/2] mm: memcg: hierarchical soft limit reclaim
On Tue 10-01-12 16:02:52, Johannes Weiner wrote:
> Right now, memcg soft limits are implemented by having a sorted tree
> of memcgs that are in excess of their limits. Under global memory
> pressure, kswapd first reclaims from the biggest excessor and then
> proceeds to do regular global reclaim. The result of this is that
> pages are reclaimed from all memcgs, but more scanning happens against
> those above their soft limit.
>
> With global reclaim doing memcg-aware hierarchical reclaim by default,
> this is a lot easier to implement: everytime a memcg is reclaimed
> from, scan more aggressively (per tradition with a priority of 0) if
> it's above its soft limit. With the same end result of scanning
> everybody, but soft limit excessors a bit more.
>
> Advantages:
>
> o smoother reclaim: soft limit reclaim is a separate stage before
> global reclaim, whose result is not communicated down the line and
> so overreclaim of the groups in excess is very likely. After this
> patch, soft limit reclaim is fully integrated into regular reclaim
> and each memcg is considered exactly once per cycle.
>
> o true hierarchy support: soft limits are only considered when
> kswapd does global reclaim, but after this patch, targetted
> reclaim of a memcg will mind the soft limit settings of its child
> groups.
Yes it makes sense. At first I was thinking that soft limit should be
considered only under global mem. pressure (at least documentation says
so) but now it makes sense.
We can push on over-soft limit groups more because they told us they
could sacrifice something... Anyway documentation needs an update as
well.
But we have to be little bit careful here. I am still quite confuses how
we should handle hierarchies vs. subtrees. See bellow.
>
> o code size: soft limit reclaim requires a lot of code to maintain
> the per-node per-zone rb-trees to quickly find the biggest
> offender, dedicated paths for soft limit reclaim etc. while this
> new implementation gets away without all that.
on my i386 pae setup (including swap extension enabled):
Before
text data bss dec hex filename
310086 29970 35372 375428 5ba84 mm/built-in.o
After
size mm/built-in.o
text data bss dec hex filename
309048 30030 35372 374450 5b6b2 mm/built-in.o
I would expect a bigger difference but still good.
> Test:
Will look into results later.
[...]
> Signed-off-by: Johannes Weiner <hannes@...xchg.org>
> ---
> include/linux/memcontrol.h | 18 +--
> mm/memcontrol.c | 412 ++++----------------------------------------
> mm/vmscan.c | 80 +--------
> 3 files changed, 48 insertions(+), 462 deletions(-)
Really nice to see
[...]
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 170dff4..d4f7ae5 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
[...]
> @@ -1318,6 +1123,36 @@ static unsigned long mem_cgroup_margin(struct mem_cgroup *memcg)
> return margin >> PAGE_SHIFT;
> }
>
> +/**
> + * mem_cgroup_over_softlimit
> + * @root: hierarchy root
> + * @memcg: child of @root to test
> + *
> + * Returns %true if @memcg exceeds its own soft limit or contributes
> + * to the soft limit excess of one of its parents up to and including
> + * @root.
> + */
> +bool mem_cgroup_over_softlimit(struct mem_cgroup *root,
> + struct mem_cgroup *memcg)
> +{
> + if (mem_cgroup_disabled())
> + return false;
> +
> + if (!root)
> + root = root_mem_cgroup;
> +
> + for (; memcg; memcg = parent_mem_cgroup(memcg)) {
> + /* root_mem_cgroup does not have a soft limit */
> + if (memcg == root_mem_cgroup)
> + break;
> + if (res_counter_soft_limit_excess(&memcg->res))
> + return true;
> + if (memcg == root)
> + break;
> + }
> + return false;
> +}
Well, this might be little bit tricky. We do not check whether memcg and
root are in a hierarchy (in terms of use_hierarchy) relation.
If we are under global reclaim then we iterate over all memcgs and so
there is no guarantee that there is a hierarchical relation between the
given memcg and its parent. While, on the other hand, if we are doing
memcg reclaim then we have this guarantee.
Why should we punish a group (subtree) which is perfectly under its soft
limit just because some other subtree contributes to the common parent's
usage and makes it over its limit?
Should we check memcg->use_hierarchy here?
Does it even makes sense to setup soft limit on a parent group without
hierarchies?
Well I have to admit that hierarchies makes me headache.
> +
> int mem_cgroup_swappiness(struct mem_cgroup *memcg)
> {
> struct cgroup *cgrp = memcg->css.cgroup;
[...]
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index e3fd8a7..4279549 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2121,8 +2121,16 @@ static void shrink_zone(int priority, struct zone *zone,
> .mem_cgroup = memcg,
> .zone = zone,
> };
> + int epriority = priority;
> + /*
> + * Put more pressure on hierarchies that exceed their
> + * soft limit, to push them back harder than their
> + * well-behaving siblings.
> + */
> + if (mem_cgroup_over_softlimit(root, memcg))
> + epriority = 0;
This sounds too aggressive to me. Shouldn't we just double the pressure
or something like that?
Previously we always had nr_to_reclaim == SWAP_CLUSTER_MAX when we did
memcg reclaim but this is not the case now. For the kswapd we have
nr_to_reclaim == ULONG_MAX so we will not break out of the reclaim early
and we have to scan a lot.
Direct reclaim (shrink or hard limit) shouldn't be affected here.
>
> - shrink_mem_cgroup_zone(priority, &mz, sc);
> + shrink_mem_cgroup_zone(epriority, &mz, sc);
>
> mem_cgroup_account_reclaim(root, memcg,
> sc->nr_reclaimed - nr_reclaimed,
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists