linux-kernel - Re: [PATCH 4/4] Memory controller soft limit reclaim on contention (v4)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090306104106.GE5482@balbir.in.ibm.com>
Date:	Fri, 6 Mar 2009 16:11:06 +0530
From:	Balbir Singh <balbir@...ux.vnet.ibm.com>
To:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
Cc:	linux-mm@...ck.org, Sudhir Kumar <skumar@...ux.vnet.ibm.com>,
	YAMAMOTO Takashi <yamamoto@...inux.co.jp>,
	Bharata B Rao <bharata@...ibm.com>,
	Paul Menage <menage@...gle.com>, lizf@...fujitsu.com,
	linux-kernel@...r.kernel.org,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	David Rientjes <rientjes@...gle.com>,
	Pavel Emelianov <xemul@...nvz.org>,
	Dhaval Giani <dhaval@...ux.vnet.ibm.com>,
	Rik van Riel <riel@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH 4/4] Memory controller soft limit reclaim on contention
	(v4)

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com> [2009-03-06 19:14:36]:

> On Fri, 6 Mar 2009 15:31:55 +0530
> Balbir Singh <balbir@...ux.vnet.ibm.com> wrote:
> 
> 
> > > > +		if (wait)
> > > > +			wait_for_completion(&mem->wait_on_soft_reclaim);
> > > >  	}
> > > What ???? Why we have to wait here...holding mmap->sem...This is too bad.
> > >
> > 
> > Since mmap_sem is no longer used for pthread_mutex*, I was not sure.
> > That is why I added the comment asking for more review and see what
> > people think about it. We get here only when
> > 
> > 1. The memcg is over its soft limit
> > 2. Tasks/threads belonging to memcg are faulting in more pages
> > 
> > The idea is to throttle them. If we did reclaim inline, like we do for
> > hard limits, we can still end up holding mmap_sem for a long time.
> > 
> This "throttle" is hard to measuer the effect and IIUC, not implemneted in
> vmscan.c ...for global try_to_free_pages() yet.
> Under memory shortage. before reaching here, the thread already called
> try_to_free_pages() or check some memory shorage conditions because
> it called alloc_pages(). So, waiting here is redundant and gives it
> too much penaly.

The reason for adding it consider the the following scenario

1. Create cgroup "a", give it a soft limit of 0
2. Create cgroup "b", give it a soft limit of 3G.

With both "a' and "b" running, reclaiming from "a" makes no sense, it
goes and does a bulk allocation and increases it usage again. It does
not make sense to reclaim from "b" until it crosses 3G.

Throttling is not implemented in the main VM, but we have seen several
patches for it. This is a special case for soft limits.

> 
> 
> > > > +	/*
> > > > +	 * This loop can run a while, specially if mem_cgroup's continuously
> > > > +	 * keep exceeding their soft limit and putting the system under
> > > > +	 * pressure
> > > > +	 */
> > > > +	do {
> > > > +		mem = mem_cgroup_get_largest_soft_limit_exceeding_node();
> > > > +		if (!mem)
> > > > +			break;
> > > > +		usage = mem_cgroup_get_node_zone_usage(mem, zone, nid);
> > > > +		if (!usage)
> > > > +			goto skip_reclaim;
> > > 
> > > Why this works well ? if "mem" is the laragest, it will be inserted again
> > > as the largest. Do I miss any ?
> > >
> > 
> > No that is correct, but when reclaim is initiated from a different
> > zone/node combination, we still want mem to show up. 
> ....
> your logic is
> ==
>    nr_reclaimd = 0;
>    do {
>       mem = select victim.
>       remvoe victim from the RBtree (the largest usage one is selected)
>       if (victim is not good)
>           goto  skip this.
>       reclaimed += shirnk_zone.
>       
> skip_this:
>       if (mem is still exceeds soft limit)
>            insert RB tree again.
>    } while(!nr_reclaimed)
> ==
> When this exits loop ?
>

This is spill over from the main code without zones and nodes. Since
there, there was no concept of 0 usage and having a mem_cgroup on the
tree with highest usage. In practice, if we hit soft limit reclaim,
for each zone, kswapd will be called, at-least for one of the
node/zones that the mem we dequeud from has memory usage in. At that
point, the necessary changes to the RB-Tree will happen. However, you
have found a potential problem and I'll fix it in the next iteration.
 
> Thanks,
> -Kame
> 
> 

-- 
	Balbir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/