linux-kernel - Re: [PATCH v3 2/7] socket: initial cgroup code.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 27 Sep 2011 17:43:29 -0300
From:	Glauber Costa <glommer@...allels.com>
To:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
CC:	Balbir Singh <bsingharora@...il.com>,
	Greg Thelen <gthelen@...gle.com>,
	<linux-kernel@...r.kernel.org>, <paul@...lmenage.org>,
	<lizf@...fujitsu.com>, <ebiederm@...ssion.com>,
	<davem@...emloft.net>, <netdev@...r.kernel.org>,
	<linux-mm@...ck.org>, <kirill@...temov.name>
Subject: Re: [PATCH v3 2/7] socket: initial cgroup code.

On 09/26/2011 07:52 AM, KAMEZAWA Hiroyuki wrote:
> On Sat, 24 Sep 2011 11:45:04 -0300
> Glauber Costa<glommer@...allels.com>  wrote:
>
>> On 09/22/2011 12:09 PM, Balbir Singh wrote:
>>> On Thu, Sep 22, 2011 at 11:30 AM, Greg Thelen<gthelen@...gle.com>   wrote:
>>>> On Wed, Sep 21, 2011 at 11:59 AM, Glauber Costa<glommer@...allels.com>   wrote:
>>>>> Right now I am working under the assumption that tasks are long lived inside
>>>>> the cgroup. Migration potentially introduces some nasty locking problems in
>>>>> the mem_schedule path.
>>>>>
>>>>> Also, unless I am missing something, the memcg already has the policy of
>>>>> not carrying charges around, probably because of this very same complexity.
>>>>>
>>>>> True that at least it won't EBUSY you... But I think this is at least a way
>>>>> to guarantee that the cgroup under our nose won't disappear in the middle of
>>>>> our allocations.
>>>>
>>>> Here's the memcg user page behavior using the same pattern:
>>>>
>>>> 1. user page P is allocate by task T in memcg M1
>>>> 2. T is moved to memcg M2.  The P charge is left behind still charged
>>>> to M1 if memory.move_charge_at_immigrate=0; or the charge is moved to
>>>> M2 if memory.move_charge_at_immigrate=1.
>>>> 3. rmdir M1 will try to reclaim P (if P was left in M1).  If unable to
>>>> reclaim, then P is recharged to parent(M1).
>>>>
>>>
>>> We also have some magic in page_referenced() to remove pages
>>> referenced from different containers. What we do is try not to
>>> penalize a cgroup if another cgroup is referencing this page and the
>>> page under consideration is being reclaimed from the cgroup that
>>> touched it.
>>>
>>> Balbir Singh
>> Do you guys see it as a showstopper for this series to be merged, or can
>> we just TODO it ?
>>
>
> In my experience, 'I can't rmdir cgroup.' is always an important/difficult
> problem. The users cannot know where the accouting is leaking other than
> kmem.usage_in_bytes or memory.usage_in_bytes. and can't fix the issue.
>
> please add EXPERIMENTAL to Kconfig until this is fixed.
>
>> I can push a proposal for it, but it would be done in a separate patch
>> anyway. Also, we may be in better conditions to fix this when the slab
>> part is merged - since it will likely have the same problems...
>>
>
> Yes. considering sockets which can be shared between tasks(cgroups)
> you'll finally need
>    - owner task of socket
>    - account moving callback
>
> Or disallow task moving once accounted.
>

So,

I tried to come up with proper task charge moving here, and the locking 
easily gets quite complicated. (But I have the feeling I am overlooking 
something...) So I think I'll really need more time for that.

What do you guys think of this following patch, + EXPERIMENTAL ?


View attachment "foo.patch" of type "text/plain" (3233 bytes)