linux-kernel - Re: [PATCH 2/2] exit: pidns: alloc_pid() leaks pid_namespace if child

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 25 Nov 2014 11:50:59 -0600
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	Oleg Nesterov <oleg@...hat.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Aaron Tomlin <atomlin@...hat.com>,
	Pavel Emelyanov <xemul@...allels.com>,
	Serge Hallyn <serge.hallyn@...ntu.com>,
	Sterling Alexander <stalexan@...hat.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/2] exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting

Oleg Nesterov <oleg@...hat.com> writes:

> On 11/24, Eric W. Biederman wrote:
>>
>> Oleg Nesterov <oleg@...hat.com> writes:
>>
>> > --- a/kernel/pid.c
>> > +++ b/kernel/pid.c
>> > @@ -320,7 +320,6 @@ struct pid *alloc_pid(struct pid_namespace *ns)
>> >  			goto out_free;
>> >  	}
>> >
>> > -	get_pid_ns(ns);
>> >  	atomic_set(&pid->count, 1);
>> >  	for (type = 0; type < PIDTYPE_MAX; ++type)
>> >  		INIT_HLIST_HEAD(&pid->tasks[type]);
>> > @@ -336,7 +335,7 @@ struct pid *alloc_pid(struct pid_namespace *ns)
>> >  	}
>> >  	spin_unlock_irq(&pidmap_lock);
>> >
>> > -out:
>> > +	get_pid_ns(ns);
>>
>> Moving the label and changing the goto out logic is gratuitous confusing
>> and I think it probably even generates worse code.
>>
>> Furthermore multiple exits make adding debugging code more difficult.
>
> Oh, I strongly disagree but I am not going to argue ;) cleanups are
> always subjective, and I do believe in "maintainer is always right"
> mantra. I can make v2 without this change.

Fair enough.  My primary complaint was that you were changing the logic
and fixing a bug at the same time.  That added noise and made analysis
of what was really going on much more difficult.

>> Moving get_pid_ns down does close a leak in the error handling path.
>
> OK, good.
>
>> However at the moment my I can't figure out if it is safe to move
>> get_pid_ns elow hlist_add_head_rcu.  Because once we are on the rcu list
>> the pid is findable, and being publicly visible with a bad refcount could cause
>> problems.
>
> The caller has a reference, this ns can't go away. Obviously, otherwise
> get_pid_ns(ns) is not safe.
>
> We need this get_pid_ns() to balance put_pid()->put_pid_ns() which obviously
> won't be called until we return this pid, otherwise everything is wrong.
>
> So I think this should be safe?

My concern is exposing a half initialized struct pid to the world via an
rcu data structure.  In particular could one of the rcu users get into
trouble because we haven't called get_pid_ns yet?  That is unclear to me.

That is one of those weird nasty races I would rather not have to
consider and moving the get_pid_ns after hlist_add requires that we
think about it.

To fix the error handling and avoid thinking about the races we have two
choices:
- In the error path that is currently called out_unlock we can drop the
  extra references.
- Immediately after we perform the test that on error jumps to out_unlock
  we call get_pid_ns.

My preference would be the first, as it is a trivially correct one line
change.

Aka I think this is the obviously correct trivial fix.

 out_unlock:
 	spin_unlock_irq(&pidmap_lock);
+	put_pid_ns(ns);
 out_free:
 	while (++i <= ns->level)
 		free_pidmap(pid->numbers + i);
 


Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/