linux-kernel - Re: [PATCH v3 2/7] socket: initial cgroup code.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Wed, 28 Sep 2011 09:56:43 +0900
From:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
To:	Glauber Costa <glommer@...allels.com>
Cc:	Balbir Singh <bsingharora@...il.com>,
	Greg Thelen <gthelen@...gle.com>,
	<linux-kernel@...r.kernel.org>, <paul@...lmenage.org>,
	<lizf@...fujitsu.com>, <ebiederm@...ssion.com>,
	<davem@...emloft.net>, <netdev@...r.kernel.org>,
	<linux-mm@...ck.org>, <kirill@...temov.name>
Subject: Re: [PATCH v3 2/7] socket: initial cgroup code.

On Mon, 26 Sep 2011 19:47:24 -0300
Glauber Costa <glommer@...allels.com> wrote:

> On 09/26/2011 07:52 AM, KAMEZAWA Hiroyuki wrote:
> > On Sat, 24 Sep 2011 11:45:04 -0300
> > Glauber Costa<glommer@...allels.com>  wrote:
> >
> >> On 09/22/2011 12:09 PM, Balbir Singh wrote:
> >>> On Thu, Sep 22, 2011 at 11:30 AM, Greg Thelen<gthelen@...gle.com>   wrote:
> >>>> On Wed, Sep 21, 2011 at 11:59 AM, Glauber Costa<glommer@...allels.com>   wrote:
> >>>>> Right now I am working under the assumption that tasks are long lived inside
> >>>>> the cgroup. Migration potentially introduces some nasty locking problems in
> >>>>> the mem_schedule path.
> >>>>>
> >>>>> Also, unless I am missing something, the memcg already has the policy of
> >>>>> not carrying charges around, probably because of this very same complexity.
> >>>>>
> >>>>> True that at least it won't EBUSY you... But I think this is at least a way
> >>>>> to guarantee that the cgroup under our nose won't disappear in the middle of
> >>>>> our allocations.
> >>>>
> >>>> Here's the memcg user page behavior using the same pattern:
> >>>>
> >>>> 1. user page P is allocate by task T in memcg M1
> >>>> 2. T is moved to memcg M2.  The P charge is left behind still charged
> >>>> to M1 if memory.move_charge_at_immigrate=0; or the charge is moved to
> >>>> M2 if memory.move_charge_at_immigrate=1.
> >>>> 3. rmdir M1 will try to reclaim P (if P was left in M1).  If unable to
> >>>> reclaim, then P is recharged to parent(M1).
> >>>>
> >>>
> >>> We also have some magic in page_referenced() to remove pages
> >>> referenced from different containers. What we do is try not to
> >>> penalize a cgroup if another cgroup is referencing this page and the
> >>> page under consideration is being reclaimed from the cgroup that
> >>> touched it.
> >>>
> >>> Balbir Singh
> >> Do you guys see it as a showstopper for this series to be merged, or can
> >> we just TODO it ?
> >>
> >
> > In my experience, 'I can't rmdir cgroup.' is always an important/difficult
> > problem. The users cannot know where the accouting is leaking other than
> > kmem.usage_in_bytes or memory.usage_in_bytes. and can't fix the issue.
> >
> > please add EXPERIMENTAL to Kconfig until this is fixed.
> 
> I am working on something here that may allow it.
> But I think it is independent of the rest, and I can repost the series 
> fixing the problems raised here without it, + EXPERIMENTAL.
> 
> Btw, using EXPERIMENTAL here is a very good idea. I think that we should
> turn EXPERIMENTAL on even if I fix for that exists, for a least a couple
> of months until we see how this thing really evolves.
> 
> What do you think?
> 

Yes, I think so. IIRC, SWAP accounting was EXPERIMENTAL for a year.

> >> I can push a proposal for it, but it would be done in a separate patch
> >> anyway. Also, we may be in better conditions to fix this when the slab
> >> part is merged - since it will likely have the same problems...
> >>
> >
> > Yes. considering sockets which can be shared between tasks(cgroups)
> > you'll finally need
> >    - owner task of socket
> >    - account moving callback
> >
> > Or disallow task moving once accounted.
> 
> I personally think disallowing task movement once accounted is 
> reasonable. At least for starters.
> 

Hmm. I'm ok with that...but I'm not very sure how that will be trouble.
So, please make it debuggable why task cannot be moved.

Thanks,
-Kame







--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/