lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110913110301.GB18886@redhat.com>
Date:	Tue, 13 Sep 2011 13:03:01 +0200
From:	Johannes Weiner <jweiner@...hat.com>
To:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Daisuke Nishimura <nishimura@....nes.nec.co.jp>,
	Balbir Singh <bsingharora@...il.com>,
	Ying Han <yinghan@...gle.com>, Michal Hocko <mhocko@...e.cz>,
	Greg Thelen <gthelen@...gle.com>,
	Michel Lespinasse <walken@...gle.com>,
	Rik van Riel <riel@...hat.com>,
	Minchan Kim <minchan.kim@...il.com>,
	Christoph Hellwig <hch@...radead.org>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Subject: Re: [patch 04/11] mm: memcg: per-priority per-zone hierarchy scan
 generations

On Tue, Sep 13, 2011 at 07:27:59PM +0900, KAMEZAWA Hiroyuki wrote:
> On Mon, 12 Sep 2011 12:57:21 +0200
> Johannes Weiner <jweiner@...hat.com> wrote:
> 
> > Memory cgroup limit reclaim currently picks one memory cgroup out of
> > the target hierarchy, remembers it as the last scanned child, and
> > reclaims all zones in it with decreasing priority levels.
> > 
> > The new hierarchy reclaim code will pick memory cgroups from the same
> > hierarchy concurrently from different zones and priority levels, it
> > becomes necessary that hierarchy roots not only remember the last
> > scanned child, but do so for each zone and priority level.
> > 
> > Furthermore, detecting full hierarchy round-trips reliably will become
> > crucial, so instead of counting on one iterator site seeing a certain
> > memory cgroup twice, use a generation counter that is increased every
> > time the child with the highest ID has been visited.
> > 
> > Signed-off-by: Johannes Weiner <jweiner@...hat.com>
> 
> I cannot image how this works. could you illustrate more with easy example ?

Previously, we did

	mem = mem_cgroup_iter(root)
	  for each priority level:
	    for each zone in zonelist:

and this would reclaim memcg-1-zone-1, memcg-1-zone-2, memcg-1-zone-3
etc.

The new code does

	for each priority level
	  for each zone in zonelist
            mem = mem_cgroup_iter(root)

but with a single last_scanned_child per memcg, this would scan
memcg-1-zone-1, memcg-2-zone-2, memcg-3-zone-3 etc, which does not
make much sense.

Now imagine two reclaimers.  With the old code, the first reclaimer
would pick memcg-1 and scan all its zones, the second reclaimer would
pick memcg-2 and reclaim all its zones.  Without this patch, the first
reclaimer would pick memcg-1 and scan zone-1, the second reclaimer
would pick memcg-2 and scan zone-1, then the first reclaimer would
pick memcg-3 and scan zone-2.  If the reclaimers are concurrently
scanning at different priority levels, things are even worse because
one reclaimer may put much more force on the memcgs it gets from
mem_cgroup_iter() than the other reclaimer.  They must not share the
same iterator.

The generations are needed because the old algorithm did not rely too
much on detecting full round-trips.  After every reclaim cycle, it
checked the limit and broke out of the loop if enough was reclaimed,
no matter how many children were reclaimed from.  The new algorithm is
used for global reclaim, where the only exit condition of the
hierarchy reclaim is the full roundtrip, because equal pressure needs
to be applied to all zones.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ