lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 21 Jun 2010 16:15:18 +0200
From:	Louis Rilling <Louis.Rilling@...labs.com>
To:	"Eric W. Biederman" <ebiederm@...ssion.com>
Cc:	Linux Containers <containers@...ts.osdl.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Pavel Emelyanov <xemul@...nvz.org>,
	linux-kernel@...r.kernel.org, Pavel Emelyanov <xemul@...allels.com>
Subject: Re: [PATCH] procfs: Do not release pid_ns->proc_mnt too early

On 21/06/10  5:58 -0700, Eric W. Biederman wrote:
> Louis Rilling <Louis.Rilling@...labs.com> writes:
> 
> > On 18/06/10 18:27 +0200, Oleg Nesterov wrote:
> >> On 06/18, Louis Rilling wrote:
> >> >
> >> > On 17/06/10 23:36 +0200, Oleg Nesterov wrote:
> >> > > On 06/17, Eric W. Biederman wrote:
> >> > > >
> >> > > > The task->children isn't changed until __unhash_process() which runs
> >> > > > after flush_proc_task().
> >> > >
> >> > > Yes. But this is only the current implementation detail.
> >> > > It would be nice to cleanup the code so that EXIT_DEAD tasks are
> >> > > never sit in ->children list.
> >> > >
> >> > > > So we should be able to come up with
> >> > > > a variant of do_wait() that zap_pid_ns_processes can use that does
> >> > > > what we need.
> >> > >
> >> > > See above...
> >> > >
> >> > > Even if we modify do_wait() or add the new variant, how the caller
> >> > > can wait for EXIT_DEAD tasks? I don't think we want to modify
> >> > > release_task() to do __wake_up_parent() or something similar.
> >> >
> >> > Indeed, I was thinking about calling __wake_up_parent() from release_task()
> >> > once parent->children becomes empty.
> >> >
> >> > Not sure about the performance impact though. Maybe some WAIT_NO_CHILDREN flag
> >> > in parent->signal could limit it. But if EXIT_DEAD children are removed from
> >> > ->children before release_task(), I'm afraid that this becomes impossible.
> >> 
> >> Thinking more, even the current do_wait() from zap_pid_ns_processes()
> >> is not really good. Suppose that some none-init thread is ptraced, then
> >> zap_pid_ns_processes() will hange until the tracer does do_wait() or
> >> exits.
> >
> > Is this really a bad thing? If somebody ptraces a task in a pid namespace, that
> > sounds reasonable to have this namespace (and it's init task) pinned.
> 
> Louis.  Have you seen this problem hit without my setns patch?

Yes. I hit it with Kerrighed patches. I also have an ugly reproducer on
2.6.35-rc3 (see attachments). Ugly because I introduced artifical delays
in release_task(). I couldn't trigger the bug without it, probably because the
scheduler is too kind :)

I'm using memory poisoining (SLAB and DEBUG_SLAB) to make it easy to observe the
bug.

Example:
# ./proc_flush_task-bug-reproducer 1

> 
> I'm pretty certain that this hits because there are processes do_wait
> does not wait for, in particular processes in a disjoint process tree.

Indeed do_wait() misses EXIT_DEAD children.

> 
> So at this point I am really favoring killing the do_wait and making
> this all asynchronous.

Any idea about how to do it?

Thanks,

Louis

-- 
Dr Louis Rilling			Kerlabs
Skype: louis.rilling			Batiment Germanium
Phone: (+33|0) 6 80 89 08 23		80 avenue des Buttes de Coesmes
http://www.kerlabs.com/			35700 Rennes

View attachment "proc_flush_task-bug-reproducer.c" of type "text/x-csrc" (1502 bytes)

View attachment "reproducer.patch" of type "text/x-diff" (691 bytes)

Download attachment "signature.asc" of type "application/pgp-signature" (198 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ