linux-kernel - Re: [RFC][PATCH 8/8] rtmutex: Fix PI chain order integrity

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160627122335.GB30154@twins.programming.kicks-ass.net>
Date:	Mon, 27 Jun 2016 14:23:35 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Juri Lelli <juri.lelli@....com>
Cc:	mingo@...nel.org, tglx@...utronix.de, rostedt@...dmis.org,
	xlpang@...hat.com, linux-kernel@...r.kernel.org,
	mathieu.desnoyers@...icios.com, jdesfossez@...icios.com,
	bristot@...hat.com
Subject: Re: [RFC][PATCH 8/8] rtmutex: Fix PI chain order integrity

On Wed, Jun 15, 2016 at 08:25:07AM +0100, Juri Lelli wrote:
> I guess it's not that likely, but yes it could potentially happen that a
> waiter is optimistically spinning, depletes its runtime, gets throttled
> and then replenished when still spinning. Maybe it doesn't really make
> sense continuing spinning in this situation, but I guess things get
> really complicated. :-/
> 
> Anyway, as said, I think this patch is OK. Maybe we want to add a
> comment just to remember what situation can cause an issue if we don't
> do this? Patch changelog would be OK as well for such a comment IMHO.


OK, so I went to write a simple comment and ended up with the below :/

While writing the comment I noticed two issues:

 - we update the waiter order fields while the entry is still enqueued
   on the pi_waiters tree, which is also sorted by these exact fields.

 - another one of these pure ->prio comparisons

Please double check, there be dragons here.

---

--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -158,6 +158,9 @@ static inline bool unlock_rt_mutex_safe(
 }
 #endif
 
+#define cmp_task(p)	\
+	&(struct rt_mutex_waiter){ .prio = (p)->prio, .deadline = (p)->dl.deadline }
+
 static inline int
 rt_mutex_waiter_less(struct rt_mutex_waiter *left,
 		     struct rt_mutex_waiter *right)
@@ -583,8 +586,26 @@ static int rt_mutex_adjust_prio_chain(st
 
 	/* [7] Requeue the waiter in the lock waiter tree. */
 	rt_mutex_dequeue(lock, waiter);
+
+	/*
+	 * Update the waiter prio fields now that we're dequeued.
+	 *
+	 * These values can have changed through either:
+	 *
+	 *   sys_sched_set_scheduler() / sys_sched_setattr()
+	 *
+	 * or
+	 *
+	 *   DL CBS enforcement advancing the effective deadline.
+	 *
+	 * Even though pi_waiters also uses these fields, and that tree is only
+	 * updated in [11], we can do this here, since we hold [L], which
+	 * serializes all pi_waiters access and rb_erase() does not care about
+	 * the values of the node being removed.
+	 */
 	waiter->prio = task->prio;
 	waiter->deadline = task->dl.deadline;
+
 	rt_mutex_enqueue(lock, waiter);
 
 	/* [8] Release the task */
@@ -711,6 +732,8 @@ static int rt_mutex_adjust_prio_chain(st
 static int try_to_take_rt_mutex(struct rt_mutex *lock, struct task_struct *task,
 				struct rt_mutex_waiter *waiter)
 {
+	lockdep_assert_held(&lock->wait_lock);
+
 	/*
 	 * Before testing whether we can acquire @lock, we set the
 	 * RT_MUTEX_HAS_WAITERS bit in @lock->owner. This forces all
@@ -770,7 +793,8 @@ static int try_to_take_rt_mutex(struct r
 			 * the top waiter priority (kernel view),
 			 * @task lost.
 			 */
-			if (task->prio >= rt_mutex_top_waiter(lock)->prio)
+			if (!rt_mutex_waiter_less(cmp_task(task),
+						  rt_mutex_top_waiter(lock)))
 				return 0;
 
 			/*
@@ -838,6 +862,8 @@ static int task_blocks_on_rt_mutex(struc
 	struct rt_mutex *next_lock;
 	int chain_walk = 0, res;
 
+	lockdep_assert_held(&lock->wait_lock);
+
 	/*
 	 * Early deadlock detection. We really don't want the task to
 	 * enqueue on itself just to untangle the mess later. It's not
@@ -973,6 +999,8 @@ static void remove_waiter(struct rt_mute
 	struct task_struct *owner = rt_mutex_owner(lock);
 	struct rt_mutex *next_lock;
 
+	lockdep_assert_held(&lock->wait_lock);
+
 	raw_spin_lock(&current->pi_lock);
 	rt_mutex_dequeue(lock, waiter);
 	current->pi_blocked_on = NULL;