lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110830084245.GC13061@redhat.com>
Date:	Tue, 30 Aug 2011 10:42:45 +0200
From:	Johannes Weiner <jweiner@...hat.com>
To:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Daisuke Nishimura <nishimura@....nes.nec.co.jp>,
	Balbir Singh <bsingharora@...il.com>,
	Andrew Brestic <abrestic@...gle.com>,
	Ying Han <yinghan@...gle.com>, Michal Hocko <mhocko@...e.cz>,
	linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [patch] Revert "memcg: add memory.vmscan_stat"

On Tue, Aug 30, 2011 at 04:20:50PM +0900, KAMEZAWA Hiroyuki wrote:
> On Tue, 30 Aug 2011 09:04:24 +0200
> Johannes Weiner <jweiner@...hat.com> wrote:
> 
> > On Tue, Aug 30, 2011 at 10:12:33AM +0900, KAMEZAWA Hiroyuki wrote:
> > > @@ -1710,11 +1711,18 @@ static void mem_cgroup_record_scanstat(s
> > >  	spin_lock(&memcg->scanstat.lock);
> > >  	__mem_cgroup_record_scanstat(memcg->scanstat.stats[context], rec);
> > >  	spin_unlock(&memcg->scanstat.lock);
> > > -
> > > -	memcg = rec->root;
> > > -	spin_lock(&memcg->scanstat.lock);
> > > -	__mem_cgroup_record_scanstat(memcg->scanstat.rootstats[context], rec);
> > > -	spin_unlock(&memcg->scanstat.lock);
> > > +	cgroup = memcg->css.cgroup;
> > > +	do {
> > > +		spin_lock(&memcg->scanstat.lock);
> > > +		__mem_cgroup_record_scanstat(
> > > +			memcg->scanstat.hierarchy_stats[context], rec);
> > > +		spin_unlock(&memcg->scanstat.lock);
> > > +		if (!cgroup->parent)
> > > +			break;
> > > +		cgroup = cgroup->parent;
> > > +		memcg = mem_cgroup_from_cont(cgroup);
> > > +	} while (memcg->use_hierarchy && memcg != rec->root);
> > 
> > Okay, so this looks correct, but it sums up all parents after each
> > memcg scanned, which could have a performance impact.  Usually,
> > hierarchy statistics are only summed up when a user reads them.
> > 
> Hmm. But sum-at-read doesn't work.
> 
> Assume 3 cgroups in a hierarchy.
> 
> 	A
>        /
>       B
>      /
>     C
> 
> C's scan contains 3 causes.
> 	C's scan caused by limit of A.
> 	C's scan caused by limit of B.
> 	C's scan caused by limit of C.
>
> If we make hierarchy sum at read, we think
> 	B's scan_stat = B's scan_stat + C's scan_stat
> But in precice, this is
> 
> 	B's scan_stat = B's scan_stat caused by B +
> 			B's scan_stat caused by A +
> 			C's scan_stat caused by C +
> 			C's scan_stat caused by B +
> 			C's scan_stat caused by A.
> 
> In orignal version.
> 	B's scan_stat = B's scan_stat caused by B +
> 			C's scan_stat caused by B +
> 
> After this patch,
> 	B's scan_stat = B's scan_stat caused by B +
> 			B's scan_stat caused by A +
> 			C's scan_stat caused by C +
> 			C's scan_stat caused by B +
> 			C's scan_stat caused by A.
> 
> Hmm...removing hierarchy part completely seems fine to me.

I see.

You want to look at A and see whether its limit was responsible for
reclaim scans in any children.  IMO, that is asking the question
backwards.  Instead, there is a cgroup under reclaim and one wants to
find out the cause for that.  Not the other way round.

In my original proposal I suggested differentiating reclaim caused by
internal pressure (due to own limit) and reclaim caused by
external/hierarchical pressure (due to limits from parents).

If you want to find out why C is under reclaim, look at its reclaim
statistics.  If the _limit numbers are high, C's limit is the problem.
If the _hierarchical numbers are high, the problem is B, A, or
physical memory, so you check B for _limit and _hierarchical as well,
then move on to A.

Implementing this would be as easy as passing not only the memcg to
scan (victim) to the reclaim code, but also the memcg /causing/ the
reclaim (root_mem):

	root_mem == victim -> account to victim as _limit
	root_mem != victim -> account to victim as _hierarchical

This would make things much simpler and more natural, both the code
and the way of tracking down a problem, IMO.

> > I don't get why this has to be done completely different from the way
> > we usually do things, without any justification, whatsoever.
> > 
> > Why do you want to pass a recording structure down the reclaim stack?
> 
> Just for reducing number of passed variables.

It's still sitting on bottom of the reclaim stack the whole time.

With my proposal, you would only need to pass the extra root_mem
pointer.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ