linux-kernel - Re: [RFD] Merge task counter into memcg

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4F870D4D.6020405@parallels.com>
Date:	Thu, 12 Apr 2012 14:13:49 -0300
From:	Glauber Costa <glommer@...allels.com>
To:	Tejun Heo <tj@...nel.org>
CC:	Johannes Weiner <hannes@...xchg.org>,
	Frederic Weisbecker <fweisbec@...il.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	Hugh Dickins <hughd@...gle.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Daniel Walsh <dwalsh@...hat.com>,
	"Daniel P. Berrange" <berrange@...hat.com>,
	Li Zefan <lizf@...fujitsu.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Cgroups <cgroups@...r.kernel.org>,
	Containers <containers@...ts.linux-foundation.org>
Subject: Re: [RFD] Merge task counter into memcg

>
> The reason why I asked Frederic whether it would make more sense as
> part of memcg wasn't about flexibility but mostly about the type of
> the resource.  I'll continue below.
>
>>> Agree. Even people aiming for unified hierarchies are okay with an
>>> opt-in/out system, I believe. So the controllers need not to be
>>> active at all times. One way of doing this is what I suggested to
>>> Frederic: If you don't limit, don't account.
>>
>> I don't agree, it's a valid usecase to monitor a workload without
>> limiting it in any way.  I do it all the time.
>
> AFAICS, this seems to be the most valid use case for different
> controllers seeing different part of the hierarchy, even if the
> hierarchies aren't completely separate.  Accounting and control being
> in separate controllers is pretty sucky too as it ends up accounting
> things multiple times.  Maybe all controllers should learn how to do
> accounting w/o applying limits?  Not sure yet.

Well...

* I don't know how blkcgrp applies limits
* the cpu cgroup, is limiting by nature, in the sense that it divides 
shares in proportion to the number of cgroups in a hierarchy
* memcg has a RESOURCE_MAX default limit that is bigger than anything 
you can possibly count.

So one of the problems, is that "limiting" may mean different thing to 
each controller.

I am mostly talking about memory cgroup here. And there. "Accounting 
without limiting" can trivially be done by setting limit to 
RESOURCE_MAX-delta. This won't work when we start having machines with 
2^64 physical memory, but I guess we have some time until it happens.

The way I see, it's just a technicality over a way to runtime disable 
the accounting of a resource without filling the hierarchy with flags.


>> To reraise a point from my other email that was ignored: do users
>> actually really care about the number of tasks when they want to
>> prevent forkbombs?  If a task would use neither CPU nor memory, you
>> would not be interested in limiting the number of tasks.
>>
>> Because the number of tasks is not a resource.  CPU and memory are.
>>
>> So again, if we would include the memory impact of tasks properly
>> (structures, kernel stack pages) in the kernel memory counters which
>> we allow to limit, shouldn't this solve our problem?
>
> The task counter is trying to control the *number* of tasks, which is
> purely memory overhead.

No, it is not. As we talk, it is becoming increasingly clear that given 
the use case, the correct term is "translating task *back* into the 
actual amount of memory".

> Translating #tasks into the actual amount of
> memory isn't too trivial tho - the task stack isn't the only
> allocation and the numbers should somehow make sense to the userland
> in consistent way.  Also, I'm not sure whether this particular limit
> should live in its silo or should be summed up together as part of
> kmem (kmem itself is in its own silo after all apart from user memory,
> right?).


It is accounted together, but limited separately. Setting 
memory.kmem.limit > memory.limit is a trivial way to say "Don't limit 
kmem". (and yet account it)

Same thing would go for a stack limit (Well, assuming it won't be merged 
into kmem itself as well)

> So, if those can be settled, I think protecting against fork
> bombs could fit memcg better in the sense that the whole thing makes
> more sense.

I myself will advise against merging anything not byte-based to memcg.
"task counter" is not byte-based.
"fork bomb preventer" might be.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/