lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:	Tue, 17 May 2011 10:25:12 +0200
From:	Johannes Weiner <hannes@...xchg.org>
To:	Ying Han <yinghan@...gle.com>
Cc:	Rik van Riel <riel@...hat.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	Daisuke Nishimura <nishimura@....nes.nec.co.jp>,
	Balbir Singh <balbir@...ux.vnet.ibm.com>,
	Michal Hocko <mhocko@...e.cz>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Minchan Kim <minchan.kim@...il.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Mel Gorman <mgorman@...e.de>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Subject: Re: [rfc patch 2/6] vmscan: make distinction between memcg reclaim
 and LRU list selection

On Mon, May 16, 2011 at 11:38:07PM -0700, Ying Han wrote:
> On Thu, May 12, 2011 at 9:03 AM, Johannes Weiner <hannes@...xchg.org> wrote:
> > On Thu, May 12, 2011 at 11:33:13AM -0400, Rik van Riel wrote:
> >> On 05/12/2011 10:53 AM, Johannes Weiner wrote:
> >> >The reclaim code has a single predicate for whether it currently
> >> >reclaims on behalf of a memory cgroup, as well as whether it is
> >> >reclaiming from the global LRU list or a memory cgroup LRU list.
> >> >
> >> >Up to now, both cases always coincide, but subsequent patches will
> >> >change things such that global reclaim will scan memory cgroup lists.
> >> >
> >> >This patch adds a new predicate that tells global reclaim from memory
> >> >cgroup reclaim, and then changes all callsites that are actually about
> >> >global reclaim heuristics rather than strict LRU list selection.
> >> >
> >> >Signed-off-by: Johannes Weiner<hannes@...xchg.org>
> >> >---
> >> >  mm/vmscan.c |   96 ++++++++++++++++++++++++++++++++++------------------------
> >> >  1 files changed, 56 insertions(+), 40 deletions(-)
> >> >
> >> >diff --git a/mm/vmscan.c b/mm/vmscan.c
> >> >index f6b435c..ceeb2a5 100644
> >> >--- a/mm/vmscan.c
> >> >+++ b/mm/vmscan.c
> >> >@@ -104,8 +104,12 @@ struct scan_control {
> >> >      */
> >> >     reclaim_mode_t reclaim_mode;
> >> >
> >> >-    /* Which cgroup do we reclaim from */
> >> >-    struct mem_cgroup *mem_cgroup;
> >> >+    /*
> >> >+     * The memory cgroup we reclaim on behalf of, and the one we
> >> >+     * are currently reclaiming from.
> >> >+     */
> >> >+    struct mem_cgroup *memcg;
> >> >+    struct mem_cgroup *current_memcg;
> >>
> >> I can't say I'm fond of these names.  I had to read the
> >> rest of the patch to figure out that the old mem_cgroup
> >> got renamed to current_memcg.
> >
> > To clarify: sc->memcg will be the memcg that hit the hard limit and is
> > the main target of this reclaim invocation.  current_memcg is the
> > iterator over the hierarchy below the target.
> 
> I would assume the new variable memcg is a renaming of the
> "mem_cgroup" which indicating which cgroup we reclaim on behalf of.

The thing is, mem_cgroup would mean both the group we are reclaiming
on behalf of AND the group we are currently reclaiming from.  Because
the hierarchy walk was implemented in memcontrol.c, vmscan.c only ever
saw one cgroup at a time.

> About the "current_memcg", i couldn't find where it is indicating to
> be the current cgroup under the hierarchy below the "memcg".

It's codified in shrink_zone().

	for each child of sc->memcg:
	  sc->current_memcg = child
	  reclaim(sc)

In the new version I named (and documented) them:

	sc->target_mem_cgroup: the entry point into the hierarchy, set
	by the functions that have the scan control structure on their
	stack.  That's the one hitting its hard limit.

	sc->mem_cgroup: the current position in the hierarchy below
	sc->target_mem_cgroup.  That's the one that actively gets its
	pages reclaimed.

> Both mem_cgroup_shrink_node_zone() and try_to_free_mem_cgroup_pages()
> are called within mem_cgroup_hierarchical_reclaim(), and the sc->memcg
> is initialized w/ the victim passed down which is already the memcg
> under hierarchy.

I changed mem_cgroup_shrink_node_zone() to use do_shrink_zone(), and
mem_cgroup_hierarchical_reclaim() no longer calls
try_to_free_mem_cgroup_pages().

So there is no hierarchy walk triggered from within a hierarchy walk.

I just noticed that there is, however, a bug in that
mem_cgroup_shrink_node_zone() does not initialize sc->current_memcg.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ