linux-kernel - Re: [PATCH] sched/deadline/rtmutex: Fix a PI crash for deadline tasks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160405091954.GI3448@twins.programming.kicks-ass.net>
Date:	Tue, 5 Apr 2016 11:19:54 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	xlpang@...hat.com
Cc:	linux-kernel@...r.kernel.org, Juri Lelli <juri.lelli@....com>,
	Ingo Molnar <mingo@...hat.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH] sched/deadline/rtmutex: Fix a PI crash for deadline tasks

On Tue, Apr 05, 2016 at 04:38:12PM +0800, Xunlei Pang wrote:
> On 2016/04/02 at 05:51, Peter Zijlstra wrote:
> > On Fri, Apr 01, 2016 at 09:34:24PM +0800, Xunlei Pang wrote:
> >
> >>>> I checked the code, currently only deadline accesses the
> >>>> pi_waiters/pi_waiters_leftmost
> >>>> without pi_lock held via rt_mutex_get_top_task(), other cases all have
> >>>> pi_lock held.
> >> Any better ideas is welcome.
> > Something like the below _might_ work; but its late and I haven't looked
> > at the PI code in a while. This basically caches a pointer to the top
> > waiter task in the running task_struct, under pi_lock and rq->lock, and
> > therefore we can use it with only rq->lock held.
> >
> > Since the task is blocked, and cannot unblock without taking itself from
> > the block chain -- which would cause rt_mutex_setprio() to set another
> > top waiter task, the lifetime rules should be good.
> 
> In rt_mutex_slowunlock(), we release pi_lock and and wait_lock first, then
> wake up the top waiter, then call rt_mutex_adjust_prio(), so there is a small
> window without any lock or irq disabled between the top waiter wake up
> and rt_mutex_adjust_prio(), which can cause problems.

That is rt_mutex_fastunlock()'s:

	bool deboost = slowfs(lock, &wake_q); /* -> rt_mutex_slowunlock() */

	wake_up_q(&wake_q);

	if (deboost)
		rt_mutex_adjust_prio(current);


(and the IRQ enabled is irrelevant, SMP can race regardless)

> For example, before calling rt_mutex_adjust_prio() to adjust the cached pointer,
> if current is preempted and the waken top waiter exited, after that, the task is
> back, and it may enter enqueue_task_dl() before entering rt_mutex_adjust_prio(),
> where the cached pointer is updated, so it will access a stale cached pointer.

Hmm, so I would argue that that is a bug in any case. Its an effective
priority 'leak', we should deboost before letting the booster run again.

But it looks like a simple fix, simply call wake_up_q() after the
deboost. The wake_q has a reference on the task so it cannot go away,
which ensures any dereferences from within the DL code must still be
valid.

Or did I miss something (again) ? :-)

---
 kernel/locking/rtmutex.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 3e746607abe5..36eb232bd29f 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1390,11 +1390,11 @@ rt_mutex_fastunlock(struct rt_mutex *lock,
 	} else {
 		bool deboost = slowfn(lock, &wake_q);
 
-		wake_up_q(&wake_q);
-
 		/* Undo pi boosting if necessary: */
 		if (deboost)
 			rt_mutex_adjust_prio(current);
+
+		wake_up_q(&wake_q);
 	}
 }