[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140430090928.GC4357@dhcp22.suse.cz>
Date: Wed, 30 Apr 2014 11:09:28 +0200
From: Michal Hocko <mhocko@...e.cz>
To: Johannes Weiner <hannes@...xchg.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Tejun Heo <tj@...nel.org>, linux-mm@...ck.org,
cgroups@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] vmscan: memcg: Always use swappiness of the
reclaimed memcg swappiness and oom_control
ping
On Thu 24-04-14 16:27:04, Michal Hocko wrote:
> On Thu 24-04-14 08:19:17, Johannes Weiner wrote:
> > On Fri, Apr 18, 2014 at 01:36:11PM +0200, Michal Hocko wrote:
> > > On Wed 16-04-14 17:13:18, Johannes Weiner wrote:
> > > > Per-memcg swappiness and oom killing can currently not be tweaked on a
> > > > memcg that is part of a hierarchy, but not the root of that hierarchy.
> > > > Users have complained that they can't configure this when they turned
> > > > on hierarchy mode. In fact, with hierarchy mode becoming the default,
> > > > this restriction disables the tunables entirely.
> > >
> > > Except when we would handle the first level under root differently,
> > > which is ugly.
> > >
> > > > But there is no good reason for this restriction.
> > >
> > > I had a patch for this somewhere on the think_more pile. I wasn't
> > > particularly happy about the semantic so I haven't posted it.
> > >
> > > > The settings for
> > > > swappiness and OOM killing are taken from whatever memcg whose limit
> > > > triggered reclaim and OOM invocation, regardless of its position in
> > > > the hierarchy tree.
> > >
> > > This is OK for the OOM knob because the memory pressure cannot be
> > > handled at that level in hierarchy and that is where the OOM happens.
> > >
> > > I am not so sure about the swappiness though. The swappiness tells us
> > > how to proportionally scan anon vs. file LRUs and those are per-memcg,
> > > not per-hierarchy (unlike the charge) so it makes sense to use it
> > > per-memcg IMO.
> > >
> > > Besides that using the reclaim target value might be quite confusing.
> > > Say, somebody wants to prevent from swapping in a certain group and
> > > yet the pages find their way to swap depending on where the reclaim is
> > > triggered from.
> > > Another thing would be that setting swappiness on an unlimited group has
> > > no effect although I would argue it makes some sense in configuration
> > > when parent is controlled by somebody else. I would like to tell how
> > > to reclaim me when I cannot say how much memory I can have.
> > >
> > > It is true that we have a different behavior for the global reclaim
> > > already but I am not entirely happy about that. Having a different
> > > behavior for the global vs. limit reclaims just calls for troubles and
> > > should be avoided as much as possible.
> > >
> > > So let's think what is the best semantic before we merge this. I would
> > > be more inclined for using per-memcg swappiness all the time (root using
> > > the global knob) for all reclaims.
> >
> > Yeah, we've always used the triggering group's swappiness value but at
> > the same time forced the whole hierarchy to have the same setting as
> > the root.
> >
> > I don't really feel strongly about this. If you prefer the per-memcg
> > swappiness I can send a followup patch - or you can.
>
> OK, I originally thought this would be in the same patch but now that I
> think about it some more it would be better to have it separate in case
> it turns out this will cause some issues (at least
> global_reclaim-always-use-global-vm_swappiness is a behavior change).
> So what do you think about this?
> ---
> From 3a865b7b53aed96d93bbcf865028e63fd6f582ab Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@...e.cz>
> Date: Thu, 24 Apr 2014 15:28:05 +0200
> Subject: [RFC PATCH] vmscan: memcg: Always use swappiness of the reclaimed memcg
>
> The memory reclaim always uses swappiness of the reclaim target memcg
> (origin of the memory pressure) or vm_swappiness for the global memory
> reclaim. This behavior was consistent (except for difference between
> global and hard limit reclaim) because swappiness was enforced to be
> consistent within each memcg hierarchy.
>
> After "mm: memcontrol: remove hierarchy restrictions for swappiness
> and oom_control" each memcg can have its own swappiness independent on
> hierarchical parents, though, so the consistency guarantee is gone.
> This can lead to an unexpected behavior. Say that a group is explicitly
> configured to not swapout by memory.swappiness=0 but its memory gets
> swapped out anyway when the memory pressure comes from its parent with a
> different swapping policy.
> It is also unexpected that the knob is meaningless without setting the
> hard limit which would trigger the reclaim and enforce the swappiness.
> There are setups where the hard limit is configured higher in the
> hierarchy by an administrator and children groups are under control of
> somebody else who is interested in the swapout behavior but not
> necessarily about the memory limit.
>
> From a semantic point of view swappiness is an attribute defining
> anon vs. file proportional scanning of LRU which is memcg specific
> (unlike charges which are propagated up the hierarchy) so it should be
> applied to the particular memcg's LRU regardless where the memory
> pressure comes from.
>
> This patch removes vmscan_swappiness() and stores the swappiness into
> the scan_control structure. mem_cgroup_swappiness is then used to
> provide the correct value before shrink_lruvec is called. The global
> vm_swappiness is used for the root memcg.
>
> Signed-off-by: Michal Hocko <mhocko@...e.cz>
> ---
> Documentation/cgroups/memory.txt | 15 +++++++--------
> mm/vmscan.c | 18 ++++++++----------
> 2 files changed, 15 insertions(+), 18 deletions(-)
>
> diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
> index 4937e6fff9b4..b3429aec444c 100644
> --- a/Documentation/cgroups/memory.txt
> +++ b/Documentation/cgroups/memory.txt
> @@ -540,14 +540,13 @@ Note:
>
> 5.3 swappiness
>
> -Similar to /proc/sys/vm/swappiness, but only affecting reclaim that is
> -triggered by this cgroup's hard limit. The tunable in the root cgroup
> -corresponds to the global swappiness setting.
> -
> -Please note that unlike the global swappiness, memcg knob set to 0
> -really prevents from any swapping even if there is a swap storage
> -available. This might lead to memcg OOM killer if there are no file
> -pages to reclaim.
> +Overrides /proc/sys/vm/swappiness for the particular group. The tunable
> +in the root cgroup corresponds to the global swappiness setting.
> +
> +Please note that unlike during the global reclaim, limit reclaim
> +enforces that 0 swappiness really prevents from any swapping even if
> +there is a swap storage available. This might lead to memcg OOM killer
> +if there are no file pages to reclaim.
>
> 5.4 failcnt
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 310e1f67625e..7d2f8226cbd0 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -86,6 +86,9 @@ struct scan_control {
> /* Scan (total_size >> priority) pages at once */
> int priority;
>
> + /* anon vs. file LRUs scanning "ratio" */
> + int swappiness;
> +
> /*
> * The memory cgroup that hit its limit and as a result is the
> * primary target of this reclaim invocation.
> @@ -1833,13 +1836,6 @@ static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan,
> return shrink_inactive_list(nr_to_scan, lruvec, sc, lru);
> }
>
> -static int vmscan_swappiness(struct scan_control *sc)
> -{
> - if (global_reclaim(sc))
> - return vm_swappiness;
> - return mem_cgroup_swappiness(sc->target_mem_cgroup);
> -}
> -
> enum scan_balance {
> SCAN_EQUAL,
> SCAN_FRACT,
> @@ -1900,7 +1896,7 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
> * using the memory controller's swap limit feature would be
> * too expensive.
> */
> - if (!global_reclaim(sc) && !vmscan_swappiness(sc)) {
> + if (!global_reclaim(sc) && !sc->swappiness) {
> scan_balance = SCAN_FILE;
> goto out;
> }
> @@ -1910,7 +1906,7 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
> * system is close to OOM, scan both anon and file equally
> * (unless the swappiness setting disagrees with swapping).
> */
> - if (!sc->priority && vmscan_swappiness(sc)) {
> + if (!sc->priority && sc->swappiness) {
> scan_balance = SCAN_EQUAL;
> goto out;
> }
> @@ -1935,7 +1931,7 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
> * With swappiness at 100, anonymous and file have the same priority.
> * This scanning priority is essentially the inverse of IO cost.
> */
> - anon_prio = vmscan_swappiness(sc);
> + anon_prio = sc->swappiness;
> file_prio = 200 - anon_prio;
>
> /*
> @@ -2221,6 +2217,7 @@ static void shrink_zone(struct zone *zone, struct scan_control *sc)
>
> lruvec = mem_cgroup_zone_lruvec(zone, memcg);
>
> + sc->swappiness = mem_cgroup_swappiness(memcg);
> shrink_lruvec(lruvec, sc);
>
> /*
> @@ -2678,6 +2675,7 @@ unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *memcg,
> .may_swap = !noswap,
> .order = 0,
> .priority = 0,
> + .swappiness = mem_cgroup_swappiness(memcg),
> .target_mem_cgroup = memcg,
> };
> struct lruvec *lruvec = mem_cgroup_zone_lruvec(zone, memcg);
> --
> 1.9.2
>
> --
> Michal Hocko
> SUSE Labs
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@...ck.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@...ck.org"> email@...ck.org </a>
--
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists