[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <4E5EF14F.3040300@parallels.com>
Date: Wed, 31 Aug 2011 23:43:27 -0300
From: Glauber Costa <glommer@...allels.com>
To: <netdev@...r.kernel.org>
CC: Linux Containers <containers@...ts.osdl.org>, <linux-mm@...ck.org>,
Pavel Emelyanov <xemul@...allels.com>,
"Eric W. Biederman" <ebiederm@...ssion.com>,
David Miller <davem@...emloft.net>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
Stephen Hemminger <shemminger@...tta.com>, <penberg@...nel.org>
Subject: [RFMC] per-container tcp buffer limitation
Hello People,
[ For the ones in linux-mm that are receiving this for the first time,
this is a follow up of
http://thread.gmane.org/gmane.linux.kernel.containers/21295 ]
Here is a new, a bit more mature version of my previous RFC. Now I
Request For More Comments from you guys in this new version of the patch.
Highlights:
* Although I do intend to experiment with more scenarios (suggestions
welcome), there does not seem to be a (huge) performance hit with this
patch applied, at least in a basic latency benchmark. That indicates
that even if we can demonstrate a performance hit, it won't be too hard
to optimize it away (famous last words?)
Since the patch touches both rcv and snd sides, I benchmarked it with
netperf against localhost. Command line: netperf -t TCP_RR -H localhost.
Without the patch
=================
Socket Size Request Resp. Elapsed Trans.
Send Recv Size Size Time Rate
bytes Bytes bytes bytes secs. per sec
16384 87380 1 1 10.00 26996.35
16384 87380
With the patch
===============
Local /Remote
Socket Size Request Resp. Elapsed Trans.
Send Recv Size Size Time Rate
bytes Bytes bytes bytes secs. per sec
16384 87380 1 1 10.00 27291.86
16384 87380
As you can see, rate is a bit higher, but still under an one percent
range, meaning it is basically unchanged. I will benchmark it with
various levels of cgroup nesting on my next submission so we can have a
better idea of the impact of it when enabled.
* As nicely pointed out by Kamezawa, I dropped the sockets cgroup, and
introduced a kmem cgroup. After careful consideration, I decided not to
reuse the memcg. Basically, my impression is that memcg is concerned
with user objects, with page granularity and its swap attributes.
Because kernel objects are entirely different, I prefer to group them here.
* Only tcp ipv4 is converted - because it is basically the one in which
memory pressure thresholds are really put to use. I plan to touch the
other protocols in the next submission.
* As with other sysctls, the sysctl controlling tcp memory pressure
behaviour was made per-netns. But it will show cgroup-data for the
current cgroup. The cgroup control file, however, will only set a
maximum value. The pressure thresholds is not the business of the box
administrator, but rather, of the container's - anything goes, provided
none of the 3 values go over the maximum.
Comments welcome
View attachment "tcp-membuf.patch" of type "text/plain" (33844 bytes)
Powered by blists - more mailing lists