lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 31 Aug 2011 23:43:27 -0300
From:	Glauber Costa <glommer@...allels.com>
To:	<netdev@...r.kernel.org>
CC:	Linux Containers <containers@...ts.osdl.org>, <linux-mm@...ck.org>,
	Pavel Emelyanov <xemul@...allels.com>,
	"Eric W. Biederman" <ebiederm@...ssion.com>,
	David Miller <davem@...emloft.net>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	Stephen Hemminger <shemminger@...tta.com>, <penberg@...nel.org>
Subject: [RFMC] per-container tcp buffer limitation

Hello People,

[ For the ones in linux-mm that are receiving this for the first time,
   this is a follow up of
   http://thread.gmane.org/gmane.linux.kernel.containers/21295 ]

Here is a new, a bit more mature version of my previous RFC. Now I 
Request For More Comments from you guys in this new version of the patch.

Highlights:

* Although I do intend to experiment with more scenarios (suggestions 
welcome), there does not seem to be a (huge) performance hit with this 
patch applied, at least in a basic latency benchmark. That indicates 
that even if we can demonstrate a performance hit, it won't be too hard 
to optimize it away (famous last words?)

Since the patch touches both rcv and snd sides, I benchmarked it with 
netperf against localhost. Command line: netperf -t TCP_RR -H localhost.

Without the patch
=================

Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

16384  87380  1        1       10.00    26996.35
16384  87380

With the patch
===============

Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size     Size    Time     Rate
bytes  Bytes  bytes    bytes   secs.    per sec

16384  87380  1        1       10.00    27291.86
16384  87380


As you can see, rate is a bit higher, but still under an one percent 
range, meaning it is basically unchanged. I will benchmark it with 
various levels of cgroup nesting on my next submission so we can have a 
better idea of the impact of it when enabled.

* As nicely pointed out by Kamezawa, I dropped the sockets cgroup, and 
introduced a kmem cgroup. After careful consideration, I decided not to 
reuse the memcg. Basically, my impression is that memcg is concerned 
with user objects, with page granularity and its swap attributes. 
Because kernel objects are entirely different, I prefer to group them here.

* Only tcp ipv4 is converted - because it is basically the one in which
memory pressure thresholds are really put to use. I plan to touch the 
other protocols in the next submission.

* As with other sysctls, the sysctl controlling tcp memory pressure 
behaviour was made per-netns. But it will show cgroup-data for the 
current cgroup. The cgroup control file, however, will only set a 
maximum value. The pressure thresholds is not the business of the box 
administrator, but rather, of the container's - anything goes, provided 
none of the 3 values go over the maximum.

Comments welcome

View attachment "tcp-membuf.patch" of type "text/plain" (33844 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ