lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <4E558137.5020900@parallels.com>
Date:	Wed, 24 Aug 2011 19:54:47 -0300
From:	Glauber Costa <glommer@...allels.com>
To:	<netdev@...r.kernel.org>
CC:	David Miller <davem@...emloft.net>,
	Linux Containers <containers@...ts.osdl.org>,
	<ebiederm@...ssion.com>, Pavel Emelyanov <xemul@...allels.com>
Subject: [RFC] per-containers tcp buffer limitation

Hello,

This is a proof of concept of some code I have here to limit tcp send 
and receive buffers per-container (in our case). At this phase, I am 
more concerned in discussing my approach, so please curse my family no 
further than the 3rd generation.

The problem we're trying to attack here, is that buffers can grow and 
fill non-reclaimable kernel memory. When doing containers, we can't 
afford having a malicious container pinning kernel memory at will, 
therefore exhausting all the others.

So here a container will be seen in the host system as a group of tasks, 
grouped in a cgroup. This cgroup will have files allowing us to specify 
global per-cgroup limits on buffers. For that purpose, I created a new 
sockets cgroup - didn't really think any other one of the existing would 
do here.

As for the network code per-se, I tried to keep the same code that deals 
with memory schedule as a basis and make it per-cgroup.
You will notice that struct proto now take function pointers to values 
controlling memory pressure and will return per-cgroup data instead of 
global ones. So the current behavior is maintained: after the first 
threshold is hit, we enter memory pressure. After that, allocations are 
suppressed.

Only tcp code was really touched here. udp had the pointers filled, but 
we're not really controlling anything. But the fact that this lives in 
generic code, makes it easier to do the same for other protocols in the 
future.

For this patch specifically, I am not touching - just provisioning - 
rmem and wmem specific knobs. I should also #ifdef a lot of this, but 
hey, remember: rfc...

One drawback of this approach I found, is that cgroups does not really 
work well with modules. A lot of the network code is modularized, so 
this would have to be fixed somehow.

Let me know what you think.

View attachment "patch-rfc-sndbuf.patch" of type "text/plain" (27968 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ