[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20141125181521.GA31963@redhat.com>
Date: Tue, 25 Nov 2014 19:15:21 +0100
From: Oleg Nesterov <oleg@...hat.com>
To: "Eric W. Biederman" <ebiederm@...ssion.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Aaron Tomlin <atomlin@...hat.com>,
Pavel Emelyanov <xemul@...allels.com>,
Serge Hallyn <serge.hallyn@...ntu.com>,
Sterling Alexander <stalexan@...hat.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/2] exit: pidns: alloc_pid() leaks pid_namespace if
child_reaper is exiting
On 11/25, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@...hat.com> writes:
>
> > On 11/24, Eric W. Biederman wrote:
> >>
> >> However at the moment my I can't figure out if it is safe to move
> >> get_pid_ns elow hlist_add_head_rcu. Because once we are on the rcu list
> >> the pid is findable, and being publicly visible with a bad refcount could cause
> >> problems.
> >
> > The caller has a reference, this ns can't go away. Obviously, otherwise
> > get_pid_ns(ns) is not safe.
> >
> > We need this get_pid_ns() to balance put_pid()->put_pid_ns() which obviously
> > won't be called until we return this pid, otherwise everything is wrong.
> >
> > So I think this should be safe?
>
> My concern is exposing a half initialized struct pid to the world via an
> rcu data structure. In particular could one of the rcu users get into
> trouble because we haven't called get_pid_ns yet? That is unclear to me.
They can't. This pid was fully initialized, in particular
pid->numbers[pid->level].ns == ns has a reference.
Just it is not ready for put_pid() which will be called by the "owner" of
this pid, the caller or the new child. So in this sense it doesn't matter
when we call get_pid_ns(), just we need to do this before return.
> That is one of those weird nasty races I would rather not have to
> consider and moving the get_pid_ns after hlist_add requires that we
> think about it.
>
> To fix the error handling and avoid thinking about the races we have two
> choices:
> - In the error path that is currently called out_unlock we can drop the
> extra references.
> - Immediately after we perform the test that on error jumps to out_unlock
> we call get_pid_ns.
>
> My preference would be the first, as it is a trivially correct one line
> change.
>
> Aka I think this is the obviously correct trivial fix.
>
> out_unlock:
> spin_unlock_irq(&pidmap_lock);
> + put_pid_ns(ns);
Sure, initially I was going to do this. But this is sub-optimal imo, I mainly
mean less clear (imho).
But again, I won't argue. I'll send V2 once we finish the discussion about 2/2.
Oleg.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists