lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 04 May 2012 08:36:08 -0700
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	Mike Galbraith <efault@....de>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Oleg Nesterov <oleg@...hat.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Pavel Emelyanov <xemul@...allels.com>,
	Cyrill Gorcunov <gorcunov@...nvz.org>,
	Louis Rilling <louis.rilling@...labs.com>
Subject: Re: [PATCH]  Re: [RFC PATCH] namespaces: fix leak on fork() failure

Mike Galbraith <efault@....de> writes:

> On Fri, 2012-05-04 at 07:13 -0700, Eric W. Biederman wrote: 
>> Mike Galbraith <efault@....de> writes:

>> Did you have HZ=100 in that kernel?  400 tasks at 100Hz all serialized
>> somehow and then doing synchronize_rcu at a jiffy each would account
>> for 4 seconds.  And the nsproxy certainly has a synchronize_rcu call.
>
> HZ=250

Rats.  Then non of my theories even approaches holding water.

>> The network namespace is comparatively heavy weight, at least in the
>> amount of code and other things it has to go through, so that would be
>> my prime suspect for those 29 seconds.  There are 2-4 synchronize_rcu
>> calls needed to put the loopback device.  Still we use
>> synchronize_rcu_expedited and that work should be out of line and all of
>> those calls should batch.
>> 
>> Mike is this something you are looking at a pursuing farther?
>
> Not really, but I can put it on my good intentions list.

About what I expected.  I just wanted to make certain I understood the
situation.

I will remember this as something weird and when I have time perhaps
I will investigate and track it.

>> I want to guess the serialization comes from waiting on children to be
>> reaped but the namespaces are all cleaned up in exit_notify() called
>> from do_exit() so that theory doesn't hold water.  The worst case
>> I can see is detach_pid from exit_signal running under the task list lock.
>> but nothing sleeps under that lock.  :(
>
> I'm up to my ears in zombies with several instances of the testcase
> running in parallel, so I imagine it's the same with hackbench.

Oh interesting.

> marge:/usr/local/tmp/starvation # taskset -c 3 ./hackbench -namespace& for i in 1 2 3 4 5 6 7 ; do ps ax|grep defunct|wc -l;sleep 1; done
> [1] 29985
> Running with 10*40 (== 400) tasks.
> 1
> 397
> 327
> 261
> 199
> 135
> 72
> marge:/usr/local/tmp/starvation # Time: 7.675

So if I read your output right the first second is spent running the
code and the rest of the time is spent reaping zombies.

So if this is all in reaping zombies it should be possible to add go
faster stripes by setting exit_signal to -1 on these guys.  I know
you can do that for threads, and I seem to remember hackbench using
threads so that might be interesting.

I wonder if it might be userspace scheduling madness.

What changes the speed of a waitpid loop?  Weird.  Very Weird.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ