[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181025155503.GF3725@redhat.com>
Date: Thu, 25 Oct 2018 17:55:04 +0200
From: Oleg Nesterov <oleg@...hat.com>
To: Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
Cc: serge@...lyn.com,
syzbot <syzbot+a9ac39bf55329e206219@...kaller.appspotmail.com>,
jmorris@...ei.org, keescook@...omium.org,
linux-kernel@...r.kernel.org,
linux-security-module@...r.kernel.org,
syzkaller-bugs@...glegroups.com
Subject: Re: KASAN: use-after-free Read in task_is_descendant
On 10/25, Tetsuo Handa wrote:
>
> On 2018/10/25 21:17, Oleg Nesterov wrote:
> >>> And yes, task_is_descendant() can hit the dead child, if nothing else it can
> >>> be killed. This can explain the kasan report.
> >>
> >> The kasan is reporting that child->real_parent (or maybe child->real_parent->real_parent
> >> or child->real_parent->real_parent->real_parent ...) was pointing to already freed memory,
> >> isn't it?
> >
> > Yes. and you know, I am all confused. I no longer can understand you :/
>
> Why don't we need to check every time like shown below?
> Why checking only once is sufficient?
Why do you think it is not sufficient?
Again, I can be easily wrong, rcu is not simple, but so far I think we need
a single check at the start.
> --- a/security/yama/yama_lsm.c
> +++ b/security/yama/yama_lsm.c
> @@ -285,7 +285,7 @@ static int task_is_descendant(struct task_struct *parent,
> rcu_read_lock();
> if (!thread_group_leader(parent))
> parent = rcu_dereference(parent->group_leader);
> - while (walker->pid > 0) {
> + while (pid_alive(walker) && walker->pid > 0) {
OK. To simplify, ets suppose that task_is_descendant() is called with tasklist
lock held. And lets suppose that all tasks are single-threaded.
Then we obviously need a single check at the start, we need to ensure that the
child was not removed from its ->real_parent->children list. The latter means
that if ->real_parent exits, the child will be re-parented and its ->real_parent
will be updated.
So we could do
read_lock(tasklist);
if (list_empty(child->sibling))
// it is dead, removed from ->children list, we can't trust
// child->real_parent
return -EWHATEVER;
task_is_descendant(current, child);
But note that we can safely use pid_alive(child) instead, detach_pid() and
list_del_init(&p->sibling) happen "at the same time" since we hold tasklist.
(And btw, I suggested several times to rename it, or add another helper with
a better name. Note also that we could check, say, ->sighand != NULL with
the same effect.)
Now. Why do you think rcu_read_lock() differs in that we need to check
pid_alive() at every step?
Suppose that one of the grand parents exits, and it is going to be freed. Again,
to (over)simplify the things, lets suppose that release_task() does
synchronize_rcu();
free_task(p);
at the end. Now, can
rcu_read_lock();
if (pid_alive(child)) {
while (child->pid)
child = child->real_parent;
}
rcu_read_unlock();
hit the already freed ->real_parent ? Say, the freed child->real_parent->real_parent.
Lets denote P1 = child->real_parent, P2 = P1->real_parent. Can P2 be already freed?
This is only possible if synchronize_rcu() above was called before rcu_read_lock(),
see the last sentence below.
If P1->real_parent is still P2, then P1 has already exited too. And we still observe
that child->real_parent == P1, this too is only possible if child has exited, so we
must see pid_alive() == F.
Why must we see pid_alive() == F without tasklist? It must be true, release_task()
is serialized by tasklist_lock, but why we can't get the stale value under
rcu_read_lock() ?
Because our rcu read-lock critical section extends beyond the return from
synchronize_rcu(), and thus we must have a full memory barrier _between_
that synchronize_rcu() and our rcu_read_lock(). We must see all memory updates,
including thread_pid = NULL which makes pid_alive() == F.
Do you see any hole?
Oleg.
Powered by blists - more mailing lists