linux-kernel - Re: [PATCH v3 2/6] posix-cpu-timers: Use PIDTYPE_TGID to simplify the logic in lookup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <878sihjgec.fsf@x220.int.ebiederm.org>
Date:   Mon, 27 Apr 2020 06:51:23 -0500
From:   ebiederm@...ssion.com (Eric W. Biederman)
To:     Oleg Nesterov <oleg@...hat.com>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Linux FS Devel <linux-fsdevel@...r.kernel.org>,
        Alexey Dobriyan <adobriyan@...il.com>,
        Alexey Gladkov <legion@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Alexey Gladkov <gladkov.alexey@...il.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        "Paul E. McKenney" <paulmck@...nel.org>
Subject: Re: [PATCH v3 2/6] posix-cpu-timers: Use PIDTYPE_TGID to simplify the logic in lookup_task

Oleg Nesterov <oleg@...hat.com> writes:

> Eric,
>
> I am sick today and can't read the code, but I feel this patch is not
> right ... please correct me.


> So, iiuc when posix_cpu_timer_create() is called and CPUCLOCK_PERTHREAD
> is false we roughly have
>
> 	task = pid_task(pid, PIDTYPE_TGID);			// lookup_task()
>
> 	/* WINDOW */
>
> 	timer->it.cpu.pid = = get_task_pid(task, PIDTYPE_TGID)	// posix_cpu_timer_create()
>
> Now suppose that we race with mt-exec and this "task" is the old leader;
> it can be release_task()'ed in the WINDOW above and then get_task_pid()
> will return NULL.

Except it is asking for PIDTYPE_TGID.

task->signal even if it is freed (which it won't be in a mt-exec)
is valid until after an rcu window.

release_task()
   put_task_struct_rcu_user()
      call_rcu(..., delayed_put_task_struct())
... rcu delay ...
delayed_put_task_struct()
   put_task_struct()
      __put_task_struct()
         put_signal_struct()
            free_signal_struct()

Which means that task->signal->pids[PIDTYPE_TGID] will remain valid even
across mt-exec.

Further the only change I have introduced is to perform this work under
rcu_read_lock vs taking a reference to task_struct.  As the reference to
task_struct does not prevent release_task, the situation with respect
to races in the rest of the code does not change.

Hmm....

If the case instead is:
> 	timer->it.cpu.pid = get_task_pid(task, PIDTYPE_PID)	// posix_cpu_timer_create()

Which can also happen for threads in the same thread group.
I have to agree that we can wind up with a NULL pid.

And that is a brand new bug, because we didn't use to use pids.
Sigh.


> That is why I suggested to change lookup_task() to return "struct pid*"
> to eliminate the pid -> task -> pid transition.

Yes.  I have to agree.  Getting rid of the pid -> task -> pid transition
looks important to close bugs like that.

> Apart from the same_thread_group() check for the "thread" case we do not
> need task_struct at all, lookup_task() can do
>
> 	if (thread) {
> 		p = pid_task(pid, PIDTYPE_PID);
> 		if (p && !same_thread_group(p, current))
> 			pid = NULL;
> 	} else {
> 		... gettime check ...
>
> 		if (!pid_has_task(pid, PIDTYPE_TGID))
> 			pid = NULL;
> 	}
>
> 	return pid;
>
> No?

There is also the posix_cpu_clock_get, where we immediately use the
clock instead of create something we can use later.

I want to say the gettime case is another reason to go through the whole
transition but the code can just as easily say "pid = task_tgid(current)"
as it can "p = current";

Eric