linux-kernel - Re: [PATCH] sched/pid fix use-after free in task_tgid

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Tue, 13 Dec 2016 08:10:26 +1300
From:   ebiederm@...ssion.com (Eric W. Biederman)
To:     Oleg Nesterov <oleg@...hat.com>
Cc:     EunTaik Lee <eun.taik.lee@...sung.com>,
        "mingo\@redhat.com" <mingo@...hat.com>,
        "peterz\@infradead.org" <peterz@...radead.org>,
        "linux-kernel\@vger.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] sched/pid fix use-after free in task_tgid_vnr

Oleg Nesterov <oleg@...hat.com> writes:

> On 12/10, Eric W. Biederman wrote:
>>
>> Oleg Nesterov <oleg@...hat.com> writes:
>>
>> > On 12/09, EunTaik Lee wrote:
>> >>
>> >> There is a use-after-free case with below call stack.
>> >>
>> >> pid_nr_ns+0x10/0x38
>> >> cgroup_pidlist_start+0x144/0x400
>> >> cgroup_seqfile_start+0x1c/0x24
>> >> kernfs_seq_start+0x54/0x90
>> >> seq_read+0x15c/0x3a8
>> >> kernfs_fop_read+0x38/0x160
>> >> __vfs_read+0x28/0xc8
>> >> vfs_read+0x84/0xfc
>>
>> How is this a use after free.  The function pid_nr_ns should take a NULL pointer
>> as input and return 0?
>
> No, the task (task_struct) itself can't go away, but task->group_leader
> can point to nowhere.
>
>> Certainly if the addtion of pid_alive fixes it pid_vnr(task_tgid(tsk))
>> is fine.  Are we perhaps missing rcu locking?
>
> rcu_read_lock() is not enough in this case, see below.
>
>> Or is the problem simply that in task_tgid we are accessing
>> task->group_leader which may already be dead?
>
> Yes. Lets forget about the callchain above, I didn't even bother to verify
> that it can actually hit the problem. Although I think EunTaik is very right,
> css_task_iter_next() does get_task_struct() and drops css_set_lock, the task
> can exit after that. Forget.
>
> Just suppose that a task simply does
>
> 	pid = task_tgid_vnr(current);
>
> after it has already called exit_notify(). And this is what perf_event_pid()
> does, perhaps we have more buggy users.
>
> In this case current->group_leader or parent/real_parent can point to the
> exited/freed tasks. I already said this many times, ee really need to nullify
> them in __unhash_process() but this needs a lot of (mostly simple)
> cleanups.

Is there anything wrong with starting with the patch below?

diff --git a/kernel/exit.c b/kernel/exit.c
index 9d68c45ebbe3..03daeecc335d 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -200,6 +200,7 @@ void release_task(struct task_struct *p)
                if (zap_leader)
                        leader->exit_state = EXIT_DEAD;
        }
+       p->group_leader = NULL;
 
        write_unlock_irq(&tasklist_lock);
        release_thread(p);


That seems to cut to the heart of the matter.  Failures will be clearer,
as will be code that is introduced to handle the situation.   Then we
don't need pid_alive or any other magic just a simple:
	rcu_read_lock();
	leader = READ_ONCE(task->group_leader);
	if (leader) {
		/* Do stuff */
	}
	rcu_read_unlock();

>> If so the fix needs to be
>> in task_tgid.
>
> Yes, task_tgid() should probably return NULL in this case, but this connects
> to "a lot of cleanups" above.

But that is important because that is where things go wrong in the
specific case under discussion.  pid_nr_ns handles all of the other
cases, it is task_tgid that went wrong.

Eric