[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87lhmzb24c.fsf@x220.int.ebiederm.org>
Date: Tue, 25 Nov 2014 11:50:59 -0600
From: ebiederm@...ssion.com (Eric W. Biederman)
To: Oleg Nesterov <oleg@...hat.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Aaron Tomlin <atomlin@...hat.com>,
Pavel Emelyanov <xemul@...allels.com>,
Serge Hallyn <serge.hallyn@...ntu.com>,
Sterling Alexander <stalexan@...hat.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/2] exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting
Oleg Nesterov <oleg@...hat.com> writes:
> On 11/24, Eric W. Biederman wrote:
>>
>> Oleg Nesterov <oleg@...hat.com> writes:
>>
>> > --- a/kernel/pid.c
>> > +++ b/kernel/pid.c
>> > @@ -320,7 +320,6 @@ struct pid *alloc_pid(struct pid_namespace *ns)
>> > goto out_free;
>> > }
>> >
>> > - get_pid_ns(ns);
>> > atomic_set(&pid->count, 1);
>> > for (type = 0; type < PIDTYPE_MAX; ++type)
>> > INIT_HLIST_HEAD(&pid->tasks[type]);
>> > @@ -336,7 +335,7 @@ struct pid *alloc_pid(struct pid_namespace *ns)
>> > }
>> > spin_unlock_irq(&pidmap_lock);
>> >
>> > -out:
>> > + get_pid_ns(ns);
>>
>> Moving the label and changing the goto out logic is gratuitous confusing
>> and I think it probably even generates worse code.
>>
>> Furthermore multiple exits make adding debugging code more difficult.
>
> Oh, I strongly disagree but I am not going to argue ;) cleanups are
> always subjective, and I do believe in "maintainer is always right"
> mantra. I can make v2 without this change.
Fair enough. My primary complaint was that you were changing the logic
and fixing a bug at the same time. That added noise and made analysis
of what was really going on much more difficult.
>> Moving get_pid_ns down does close a leak in the error handling path.
>
> OK, good.
>
>> However at the moment my I can't figure out if it is safe to move
>> get_pid_ns elow hlist_add_head_rcu. Because once we are on the rcu list
>> the pid is findable, and being publicly visible with a bad refcount could cause
>> problems.
>
> The caller has a reference, this ns can't go away. Obviously, otherwise
> get_pid_ns(ns) is not safe.
>
> We need this get_pid_ns() to balance put_pid()->put_pid_ns() which obviously
> won't be called until we return this pid, otherwise everything is wrong.
>
> So I think this should be safe?
My concern is exposing a half initialized struct pid to the world via an
rcu data structure. In particular could one of the rcu users get into
trouble because we haven't called get_pid_ns yet? That is unclear to me.
That is one of those weird nasty races I would rather not have to
consider and moving the get_pid_ns after hlist_add requires that we
think about it.
To fix the error handling and avoid thinking about the races we have two
choices:
- In the error path that is currently called out_unlock we can drop the
extra references.
- Immediately after we perform the test that on error jumps to out_unlock
we call get_pid_ns.
My preference would be the first, as it is a trivially correct one line
change.
Aka I think this is the obviously correct trivial fix.
out_unlock:
spin_unlock_irq(&pidmap_lock);
+ put_pid_ns(ns);
out_free:
while (++i <= ns->level)
free_pidmap(pid->numbers + i);
Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists