linux-kernel - Re: [PATCH -V8 11/16] hugetlb/cgroup: Add charge/uncharge routines for hugetlb cgroup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4FD87332.8030805@jp.fujitsu.com>
Date:	Wed, 13 Jun 2012 20:02:10 +0900
From:	Kamezawa Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
To:	"Aneesh Kumar K.V" <aneesh.kumar@...ux.vnet.ibm.com>
CC:	linux-mm@...ck.org, dhillf@...il.com, rientjes@...gle.com,
	mhocko@...e.cz, akpm@...ux-foundation.org, hannes@...xchg.org,
	linux-kernel@...r.kernel.org, cgroups@...r.kernel.org
Subject: Re: [PATCH -V8 11/16] hugetlb/cgroup: Add charge/uncharge routines
 for hugetlb cgroup

(2012/06/12 19:50), Aneesh Kumar K.V wrote:
> Kamezawa Hiroyuki<kamezawa.hiroyu@...fujitsu.com>  writes:
> 
>> (2012/06/09 17:59), Aneesh Kumar K.V wrote:
>>> From: "Aneesh Kumar K.V"<aneesh.kumar@...ux.vnet.ibm.com>
>>>
>>> This patchset add the charge and uncharge routines for hugetlb cgroup.
>>> This will be used in later patches when we allocate/free HugeTLB
>>> pages.
>>>
>>> Signed-off-by: Aneesh Kumar K.V<aneesh.kumar@...ux.vnet.ibm.com>
>>
>>
>> I'm sorry if following has been already pointed out.
>>
>>> ---
>>>    mm/hugetlb_cgroup.c |   87 +++++++++++++++++++++++++++++++++++++++++++++++++++
>>>    1 file changed, 87 insertions(+)
>>>
>>> diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
>>> index 20a32c5..48efd5a 100644
>>> --- a/mm/hugetlb_cgroup.c
>>> +++ b/mm/hugetlb_cgroup.c
>>> @@ -105,6 +105,93 @@ static int hugetlb_cgroup_pre_destroy(struct cgroup *cgroup)
>>>    	   return -EBUSY;
>>>    }
>>>
>>> +int hugetlb_cgroup_charge_page(int idx, unsigned long nr_pages,
>>> +			       struct hugetlb_cgroup **ptr)
>>> +{
>>> +	int ret = 0;
>>> +	struct res_counter *fail_res;
>>> +	struct hugetlb_cgroup *h_cg = NULL;
>>> +	unsigned long csize = nr_pages * PAGE_SIZE;
>>> +
>>> +	if (hugetlb_cgroup_disabled())
>>> +		goto done;
>>> +	/*
>>> +	 * We don't charge any cgroup if the compound page have less
>>> +	 * than 3 pages.
>>> +	 */
>>> +	if (hstates[idx].order<   2)
>>> +		goto done;
>>> +again:
>>> +	rcu_read_lock();
>>> +	h_cg = hugetlb_cgroup_from_task(current);
>>> +	if (!h_cg)
>>> +		h_cg = root_h_cgroup;
>>> +
>>> +	if (!css_tryget(&h_cg->css)) {
>>> +		rcu_read_unlock();
>>> +		goto again;
>>> +	}
>>> +	rcu_read_unlock();
>>> +
>>> +	ret = res_counter_charge(&h_cg->hugepage[idx], csize,&fail_res);
>>> +	css_put(&h_cg->css);
>>> +done:
>>> +	*ptr = h_cg;
>>> +	return ret;
>>> +}
>>> +
>>
>> Memory cgroup uses very complicated 'charge' routine for handling pageout...
>> which gets sleep.
>>
>> For hugetlbfs, it has not sleep routine, you can do charge in simple way.
>> I guess...get/put here is overkill.
>>
>> For example, h_cg cannot be freed while it has tasks. So, if 'current' is
>> belongs to the cgroup, it cannot be disappear. Then, you don't need get/put,
>> additional atomic ops for holding cgroup.
>>
>> 	rcu_read_lock();
>> 	h_cg = hugetlb_cgroup_from_task(current);
>> 	ret = res_counter_charge(&h_cg->hugetpage[idx], csize,&fail_res);
>> 	rcu_read_unlock();
>>
>> 	return ret;
>>
> 
> What if the task got moved ot of the cgroup and cgroup got deleted by an
> rmdir ?
> 

I think 
 - yes, the task, 'current', can be moved off from the cgroup.
 - rcu_read_lock() prevents ->destroy() cgroup.

Then, the concern is that the cgroup may have resource usage even after
->pre_destroy() is called. We don't have any serialization between
charging <-> task_move <-> rmdir().

How about taking
	write_lock(&mm->mmap_sem)
	write_unlock(&mm->mmap_sem)

at moving task (->attach()) ? This will serialize task-move and charging
without any realistic performance impact. If tasks cannot move, rmdir
never happens.

Maybe you can do this later as an optimization. So, please take this as
an suggestion.

Thanks,
-Kame

















--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/