lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <m1vckcdoey.fsf@fess.ebiederm.org>
Date:	Fri, 04 May 2012 07:13:57 -0700
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	Mike Galbraith <efault@....de>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Oleg Nesterov <oleg@...hat.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Pavel Emelyanov <xemul@...allels.com>,
	Cyrill Gorcunov <gorcunov@...nvz.org>,
	Louis Rilling <louis.rilling@...labs.com>
Subject: Re: [PATCH]  Re: [RFC PATCH] namespaces: fix leak on fork() failure

Mike Galbraith <efault@....de> writes:

> On Fri, 2012-05-04 at 00:55 -0700, Eric W. Biederman wrote:
>
>> CLONE_NEWUSER?  I presume you have applied my latest user namespace
>> patches?  Otherwise you are running completely half baked code.
>
> I Removed CLONE_NEWUSER flag.
>
>> hackbench?  Which kernel are you running.  Hackbench in some kernels is
>> really good at triggering cache ping-pong effects with pids, and creds.
>
> Not when pinned.  3.0 kernel without the debug stuff enabled in 3.4.git.
> 
> marge:/usr/local/tmp/starvation # taskset -c 3 ./hackbench
> Running with 10*40 (== 400) tasks.
> Time: 0.868
> marge:/usr/local/tmp/starvation # taskset -c 3 ./hackbench -namespace
> Running with 10*40 (== 400) tasks.
> Time: 7.582
> marge:/usr/local/tmp/starvation # taskset -c 3 ./hackbench -namespace -all
> Running with 10*40 (== 400) tasks.
> Time: 29.677

Interesting.  I guess what truly puzzles me is what serializes all of
the processes.  Even synchronize_rcu should sleep and thus let other
synchronize_rcu calls run in parallel.

Did you have HZ=100 in that kernel?  400 tasks at 100Hz all serialized
somehow and then doing synchronize_rcu at a jiffy each would account
for 4 seconds.  And the nsproxy certainly has a synchronize_rcu call.

The network namespace is comparatively heavy weight, at least in the
amount of code and other things it has to go through, so that would be
my prime suspect for those 29 seconds.  There are 2-4 synchronize_rcu
calls needed to put the loopback device.  Still we use
synchronize_rcu_expedited and that work should be out of line and all of
those calls should batch.

Mike is this something you are looking at a pursuing farther?

I want to guess the serialization comes from waiting on children to be
reaped but the namespaces are all cleaned up in exit_notify() called
from do_exit() so that theory doesn't hold water.  The worst case
I can see is detach_pid from exit_signal running under the task list lock.
but nothing sleeps under that lock.  :(

So I am very puzzled why the code serializes itself in a way that leads
to those long delays.  Shrug.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ