linux-kernel - Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120328154434.GN20949@tiehlicka.suse.cz>
Date:	Wed, 28 Mar 2012 17:44:34 +0200
From:	Michal Hocko <mhocko@...e.cz>
To:	"Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>
Cc:	linux-mm@...ck.org, mgorman@...e.de,
	kamezawa.hiroyu@...fujitsu.com, dhillf@...il.com,
	aarcange@...hat.com, akpm@...ux-foundation.org, hannes@...xchg.org,
	linux-kernel@...r.kernel.org, cgroups@...r.kernel.org
Subject: Re: [PATCH -V4 04/10] memcg: Add HugeTLB extension

On Wed 28-03-12 19:10:36, Aneesh Kumar K.V wrote:
> Michal Hocko <mhocko@...e.cz> writes:
> 
> > On Fri 16-03-12 23:09:24, Aneesh Kumar K.V wrote:
> >> From: "Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>
> >> 
> >> This patch implements a memcg extension that allows us to control
> >> HugeTLB allocations via memory controller.
> >
> > And the infrastructure is not used at this stage (you forgot to
> > mention).
> > The changelog should be much more descriptive.
> 
> 
> Will update the changelog.

Thx

> 
> >
> >> 
> >> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@...ux.vnet.ibm.com>
> >> ---
> >>  include/linux/hugetlb.h    |    1 +
> >>  include/linux/memcontrol.h |   42 +++++++++++++
> >>  init/Kconfig               |    8 +++
> >>  mm/hugetlb.c               |    2 +-
> >>  mm/memcontrol.c            |  138 ++++++++++++++++++++++++++++++++++++++++++++
> >>  5 files changed, 190 insertions(+), 1 deletions(-)
> >> 
> > [...]
> >> diff --git a/init/Kconfig b/init/Kconfig
> >> index 3f42cd6..f0eb8aa 100644
> >> --- a/init/Kconfig
> >> +++ b/init/Kconfig
> >> @@ -725,6 +725,14 @@ config CGROUP_PERF
> >>  
> >>  	  Say N if unsure.
> >>  
> >> +config MEM_RES_CTLR_HUGETLB
> >> +	bool "Memory Resource Controller HugeTLB Extension (EXPERIMENTAL)"
> >> +	depends on CGROUP_MEM_RES_CTLR && HUGETLB_PAGE && EXPERIMENTAL
> >> +	default n
> >> +	help
> >> +	  Add HugeTLB management to memory resource controller. When you
> >> +	  enable this, you can put a per cgroup limit on HugeTLB usage.
> >
> > How does it interact with the hard/soft limists etc...
> 
> 
> There is no softlimit support for HugeTLB extension.

Sure, sorry for not being precise. The point was how this interacts with
memcg hard/soft limit (they are independent) etc...

> > [...]
> >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> >> index 6728a7a..4b36c5e 100644
> >> --- a/mm/memcontrol.c
> >> +++ b/mm/memcontrol.c
> >> @@ -235,6 +235,10 @@ struct mem_cgroup {
> >>  	 */
> >>  	struct res_counter memsw;
> >>  	/*
> >> +	 * the counter to account for hugepages from hugetlb.
> >> +	 */
> >> +	struct res_counter hugepage[HUGE_MAX_HSTATE];
> >> +	/*
> >>  	 * Per cgroup active and inactive list, similar to the
> >>  	 * per zone LRU lists.
> >>  	 */
> >> @@ -3156,6 +3160,128 @@ static inline int mem_cgroup_move_swap_account(swp_entry_t entry,
> >>  }
> >>  #endif
> >>  
> >> +#ifdef CONFIG_MEM_RES_CTLR_HUGETLB
> >> +static bool mem_cgroup_have_hugetlb_usage(struct mem_cgroup *memcg)
> >> +{
> >> +	int idx;
> >> +	for (idx = 0; idx < hugetlb_max_hstate; idx++) {
> >
> > Maybe we should expose for_each_hstate as well...
> 
> 
> That will not really help here. If we use for_each_hstate then we will
> need to use hstate_index to get the index.

Fair enough

> >> +		if (memcg->hugepage[idx].usage > 0)
> >> +			return 1;
> >> +	}
> >> +	return 0;
> >> +}
> >> +
> >> +int mem_cgroup_hugetlb_charge_page(int idx, unsigned long nr_pages,
> >> +				   struct mem_cgroup **ptr)
> >> +{
> >> +	int ret = 0;
> >> +	struct mem_cgroup *memcg;
> >> +	struct res_counter *fail_res;
> >> +	unsigned long csize = nr_pages * PAGE_SIZE;
> >> +
> >> +	if (mem_cgroup_disabled())
> >> +		return 0;
> >> +again:
> >> +	rcu_read_lock();
> >> +	memcg = mem_cgroup_from_task(current);
> >> +	if (!memcg)
> >> +		memcg = root_mem_cgroup;
> >> +	if (mem_cgroup_is_root(memcg)) {
> >> +		rcu_read_unlock();
> >> +		goto done;
> >> +	}
> >> +	if (!css_tryget(&memcg->css)) {
> >> +		rcu_read_unlock();
> >> +		goto again;
> >> +	}
> >> +	rcu_read_unlock();
> >> +
> >> +	ret = res_counter_charge(&memcg->hugepage[idx], csize, &fail_res);
> >> +	css_put(&memcg->css);
> >> +done:
> >> +	*ptr = memcg;
> >
> > Why do we set ptr even for the failure case after we dropped a
> > reference?
> 
> That ensures that *ptr is NULL. 

Does it? AFAICS res_counter_charge might fail and you would use non NULL
memcg (with a dropped reference).

[...]
> >> +	SetPageCgroupUsed(pc);
> >> +
> >> +	unlock_page_cgroup(pc);
> >> +	return;
> >> +}
> >> +
> > [...]
> >> @@ -4887,6 +5013,7 @@ err_cleanup:
> >>  static struct cgroup_subsys_state * __ref
> >>  mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
> >>  {
> >> +	int idx;
> >>  	struct mem_cgroup *memcg, *parent;
> >>  	long error = -ENOMEM;
> >>  	int node;
> >> @@ -4929,9 +5056,14 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
> >>  		 * mem_cgroup(see mem_cgroup_put).
> >>  		 */
> >>  		mem_cgroup_get(parent);
> >> +		for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
> >
> > Do we have to init all hstates or is hugetlb_max_hstate enough?
> 
> 
> Yes. we do call mem_cgroup_create for root cgroup before initialzing
> hugetlb hstate.

drop a comment?


-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/