netdev - Re: [RFC] per-containers tcp buffer limitation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <m14o16qlq1.fsf@fess.ebiederm.org>
Date:	Wed, 24 Aug 2011 19:16:38 -0700
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
Cc:	Glauber Costa <glommer@...allels.com>,
	Linux Containers <containers@...ts.osdl.org>,
	netdev@...r.kernel.org, David Miller <davem@...emloft.net>,
	Pavel Emelyanov <xemul@...allels.com>
Subject: Re: [RFC] per-containers tcp buffer limitation

KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com> writes:

> On Wed, 24 Aug 2011 22:28:59 -0300
> Glauber Costa <glommer@...allels.com> wrote:
>
>> On 08/24/2011 09:35 PM, Eric W. Biederman wrote:
>> > Glauber Costa<glommer@...allels.com>  writes:
>> Hi Eric,
>> 
>> Thanks for your attention.
>> 
>> So, this that you propose was my first implementation. I ended up 
>> throwing it away after playing with it for a while.
>> 
>> One of the first problems that arise from that, is that the sysctls are
>> a tunable visible from inside the container. Those limits, however, are 
>> to be set from the outside world. The code is not much better than that 
>> either, and instead of creating new cgroup structures and linking them 
>> to the protocol, we end up doing it for net ns. We end up increasing 
>> structures just the same...

You don't need to add a netns member to sockets.

But I do agree that there are odd permission issues with using the
existing sysctls and making them per namespace.

However almost everything I have seen with memory limits I have found
very strange.  They all seem like a very bad version of disabling memory
over commits.

>> Also, since we're doing resource control, it seems more natural to use 
>> cgroups. Now, the fact that there are no correlation whatsoever between 
>> cgroups and namespaces does bother me. But that's another story, much 
>> more broader and general than this patch.
>> 
>
> I think using cgroup makes sense. A question in mind is whehter it is
> better to integrate this kind of 'memory usage' controls to memcg or
> not.

Maybe.  When sockets start getting a cgroup member I start wondering,
how many cgroup members will sockets potentially belong to.

> How do you think ? IMHO, having cgroup per class of object is messy.
> ...
> How about adding 
> 	memory.tcp_mem 
> to memcg ?
>
> Or, adding kmem cgroup ?
>
>> About overhead, since this is the first RFC, I did not care about 
>> measuring. However, it seems trivial to me to guarantee that at least 
>> that it won't impose a significant performance penalty when it is 
>> compiled out. If we're moving forward with this implementation, I will
>> include data in the next release so we can discuss in this basis.
>> 
>
> IMHO, you should show performance number even if RFC. Then, people will
> see patch with more interests.

And also compiled out doesn't really count.  Cgroups are something you
want people to compile into distributions for the common case, and you
don't want to impose a noticeable performance penalty for the common
case.

Eric

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html