linux-kernel - Re: [PATCH 2/2] exit: pidns: alloc_pid() leaks pid_namespace if child

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 25 Nov 2014 12:43:15 -0600
From:	ebiederm@...ssion.com (Eric W. Biederman)
To:	Oleg Nesterov <oleg@...hat.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Aaron Tomlin <atomlin@...hat.com>,
	Pavel Emelyanov <xemul@...allels.com>,
	Serge Hallyn <serge.hallyn@...ntu.com>,
	Sterling Alexander <stalexan@...hat.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/2] exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting

Oleg Nesterov <oleg@...hat.com> writes:

> On 11/25, Eric W. Biederman wrote:
>>
>> Oleg Nesterov <oleg@...hat.com> writes:
>>
>> > On 11/24, Eric W. Biederman wrote:
>> >>
>> >> However at the moment my I can't figure out if it is safe to move
>> >> get_pid_ns elow hlist_add_head_rcu.  Because once we are on the rcu list
>> >> the pid is findable, and being publicly visible with a bad refcount could cause
>> >> problems.
>> >
>> > The caller has a reference, this ns can't go away. Obviously, otherwise
>> > get_pid_ns(ns) is not safe.
>> >
>> > We need this get_pid_ns() to balance put_pid()->put_pid_ns() which obviously
>> > won't be called until we return this pid, otherwise everything is wrong.
>> >
>> > So I think this should be safe?
>>
>> My concern is exposing a half initialized struct pid to the world via an
>> rcu data structure.  In particular could one of the rcu users get into
>> trouble because we haven't called get_pid_ns yet?  That is unclear to me.
>
> They can't. This pid was fully initialized, in particular
> pid->numbers[pid->level].ns == ns has a reference.
>
> Just it is not ready for put_pid() which will be called by the "owner" of
> this pid, the caller or the new child. So in this sense it doesn't matter
> when we call get_pid_ns(), just we need to do this before return.

Or by someone calling find_get_pid() ... put_pid().

Now the reference count should not hit zero in that case but I hate to
think of that case separately.

>> That is one of those weird nasty races I would rather not have to
>> consider and moving the get_pid_ns after hlist_add requires that we
>> think about it.
>>
>> To fix the error handling and avoid thinking about the races we have two
>> choices:
>> - In the error path that is currently called out_unlock we can drop the
>>   extra references.
>> - Immediately after we perform the test that on error jumps to out_unlock
>>   we call get_pid_ns.
>>
>> My preference would be the first, as it is a trivially correct one line
>> change.
>>
>> Aka I think this is the obviously correct trivial fix.
>>
>>  out_unlock:
>>  	spin_unlock_irq(&pidmap_lock);
>> +	put_pid_ns(ns);
>
> Sure, initially I was going to do this. But this is sub-optimal imo, I mainly
> mean less clear (imho).
>
> But again, I won't argue. I'll send V2 once we finish the discussion
> about 2/2.

At this point, and especially since we need to Cc stable and get this
fix backported to who knows how many kernel releases having something
that is trivial to validate is correct is important.

If you prefer to call get_pid_ns() immedately after:

	if (!(ns->nr_hashed & PIDNS_HASH_ADDING))
		goto out_unlock;

That would be fine with me as well.

Anything else to too clever for my brain to verify is correct today.

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/