[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1475270782-51720-9-git-send-email-Waiman.Long@hpe.com>
Date: Fri, 30 Sep 2016 17:26:17 -0400
From: Waiman Long <Waiman.Long@....com>
To: Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Jonathan Corbet <corbet@....net>
Cc: linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Davidlohr Bueso <dave@...olabs.net>,
Mike Galbraith <umgwanakikbuti@...il.com>,
Jason Low <jason.low2@....com>,
Scott J Norton <scott.norton@....com>,
Douglas Hatch <doug.hatch@....com>,
Waiman Long <Waiman.Long@....com>
Subject: [PATCH v3 08/13] futex: Enable robust handling of TP futexes
The TP futexes don't have code to handle the death of futex
owners. There are 2 different cases that need to be considered.
As top waiter gets a reference to the task structure of the futex
owner, the task structure will never go away even if the owner dies.
When the futex owner died while the top waiter is spinning, the task
structure will be marked dead or the pid won't have a matching task
structure if the task died before a reference is taken. Alternatively,
if robust futex attribute is enabled, the FUTEX_OWNER_DIED bit of
the futex word may be set. In all those cases, what the top waiter
need to do is to grab the futex directly. An informational message
will also printed to highlight this event.
If the futex owner died while the top waiter is sleeping, we need to
make the exit processing code to wake up the top waiter. This is done
by chaining the futex state object into the pi_state_list of the futex
owner before the top waiter sleeps so that if exit_pi_state_list()
is called, the wakeup will happen. The top waiter needs to remove
its futex state object from the pi_state_list of the old owner if
the ownership changes hand.
Signed-off-by: Waiman Long <Waiman.Long@....com>
---
kernel/futex.c | 82 +++++++++++++++++++++++++++++++++++++++++++++++++++----
1 files changed, 76 insertions(+), 6 deletions(-)
diff --git a/kernel/futex.c b/kernel/futex.c
index bfe17a9..5fa9a10 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -973,7 +973,7 @@ static struct task_struct * futex_find_get_task(pid_t pid)
}
/*
- * This task is holding PI mutexes at exit time => bad.
+ * This task is holding PI or TP mutexes at exit time => bad.
* Kernel cleans up PI-state, but userspace is likely hosed.
* (Robust-futex cleanup is separate and might save the day for userspace.)
*/
@@ -990,12 +990,29 @@ void exit_pi_state_list(struct task_struct *curr)
* We are a ZOMBIE and nobody can enqueue itself on
* pi_state_list anymore, but we have to be careful
* versus waiters unqueueing themselves:
+ *
+ * For TP futexes, the only purpose of showing up in the
+ * pi_state_list is for this function to wake up the serialization
+ * mutex owner (state->owner). We don't actually need to take the
+ * HB lock. The futex state and task struct won't go away as long
+ * as we hold the pi_lock.
*/
raw_spin_lock_irq(&curr->pi_lock);
while (!list_empty(head)) {
next = head->next;
pi_state = list_entry(next, struct futex_state, list);
+
+ if (pi_state->type == TYPE_TP) {
+ struct task_struct *owner = READ_ONCE(pi_state->owner);
+
+ WARN_ON(list_empty(&pi_state->list));
+ list_del_init(&pi_state->list);
+ if (owner)
+ wake_up_process(owner);
+ continue;
+ }
+
key = pi_state->key;
hb = hash_futex(&key);
raw_spin_unlock_irq(&curr->pi_lock);
@@ -3152,8 +3169,8 @@ retry:
goto retry;
/*
- * Wake robust non-PI futexes here. The wakeup of
- * PI futexes happens in exit_pi_state():
+ * Wake robust wait-wake futexes here. The wakeup of
+ * PI and TP futexes happens in exit_pi_state():
*/
if (!pi && (uval & FUTEX_WAITERS))
futex_wake(uaddr, 1, 1, FUTEX_BITSET_MATCH_ANY);
@@ -3295,6 +3312,12 @@ void exit_robust_list(struct task_struct *curr)
* Unlike the other futexes, the futex_q structures aren't used. Instead,
* they will queue up in the serialization mutex of the futex state container
* queued in the hash bucket.
+ *
+ * To handle the exceptional case that the futex owner died, the robust
+ * futexes list mechanism is used to for waking up sleeping top waiter.
+ * Checks are also made in the futex_spin_on_owner() loop for dead task
+ * structure or invalid pid. In both cases, the top waiter will take over
+ * the ownership of the futex.
*/
/**
@@ -3459,6 +3482,7 @@ static int futex_spin_on_owner(u32 __user *uaddr, u32 vpid,
u32 uval;
u32 owner_pid = 0;
struct task_struct *owner_task = NULL;
+ bool on_owner_pi_list = false;
WRITE_ONCE(state->owner, current);
preempt_disable();
@@ -3468,13 +3492,47 @@ static int futex_spin_on_owner(u32 __user *uaddr, u32 vpid,
break;
if ((uval & FUTEX_TID_MASK) != owner_pid) {
- if (owner_task)
+ if (owner_task) {
+ /*
+ * task_pi_list_del() should always be
+ * done before put_task_struct(). The futex
+ * state may have been dequeued if the task
+ * is dead.
+ */
+ if (on_owner_pi_list) {
+ task_pi_list_del(owner_task, state,
+ false);
+ on_owner_pi_list = false;
+ }
put_task_struct(owner_task);
+ }
owner_pid = uval & FUTEX_TID_MASK;
owner_task = futex_find_get_task(owner_pid);
}
+ if (unlikely(!owner_task ||
+ (owner_task->flags & PF_EXITING) ||
+ (uval & FUTEX_OWNER_DIED))) {
+ /*
+ * PID invalid or exiting/dead task, we can directly
+ * grab the lock now.
+ */
+ u32 curval;
+
+ ret = cmpxchg_futex_value(&curval, uaddr, uval, vpid);
+ if (unlikely(ret))
+ goto efault;
+ if (curval != uval)
+ continue;
+ pr_info("futex: owner pid %d of futex 0x%lx was %s, lock is now acquired by pid %d!\n",
+ owner_pid, (long)uaddr,
+ (owner_task || (uval & FUTEX_OWNER_DIED))
+ ? "dead" : "invalid", vpid);
+
+ break;
+ }
+
if (need_resched()) {
__set_current_state(TASK_RUNNING);
schedule_preempt_disabled();
@@ -3493,11 +3551,18 @@ static int futex_spin_on_owner(u32 __user *uaddr, u32 vpid,
/*
* If the owner isn't active, we need to go to sleep after
- * making sure that the FUTEX_WAITERS bit is set.
+ * making sure that the FUTEX_WAITERS bit is set. We also
+ * need to put the futex state into the futex owner's
+ * pi_state_list to prevent deadlock when the owner dies.
*/
if (futex_set_waiters_bit(uaddr, &uval) < 0)
goto efault;
+ if (!on_owner_pi_list) {
+ task_pi_list_add(owner_task, state, false);
+ on_owner_pi_list = true;
+ }
+
/*
* Do a trylock after setting the task state to make
* sure we won't miss a wakeup.
@@ -3528,8 +3593,13 @@ static int futex_spin_on_owner(u32 __user *uaddr, u32 vpid,
}
out:
preempt_enable();
- if (owner_task)
+ if (owner_task) {
+ if (on_owner_pi_list)
+ task_pi_list_del(owner_task, state, false);
put_task_struct(owner_task);
+ } else {
+ WARN_ON(on_owner_pi_list);
+ }
/*
* Cleanup futex state.
--
1.7.1
Powered by blists - more mailing lists