lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <4A7A1D48.7040309@us.ibm.com>
Date:	Wed, 05 Aug 2009 17:01:12 -0700
From:	Darren Hart <dvhltc@...ibm.com>
To:	"lkml, " <linux-kernel@...r.kernel.org>,
	linux-rt-users <linux-rt-users@...r.kernel.org>
CC:	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <peterz@...radead.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ingo Molnar <mingo@...e.hu>, John Kacur <jkacur@...hat.com>,
	Dinakar Guniguntala <dino@...ibm.com>,
	John Stultz <johnstul@...ux.vnet.ibm.com>
Subject: [RFC][PATCH] fixup pi_state in futex_requeue on lock steal

NOT FOR INCLUSION

Fixup the uval and pi_state owner in futex_requeue(requeue_pi=1) in the event
of a lock steal or owner died.  I had hoped to leave it up to the new owner to
fix up the userspace value since we can't really handle a fault here
gracefully.  This should be safe as the lock is contended and should force all
userspace attempts to lock or unlock into the kernel where they'll block on the
hb lock.  However, when I don't update the uaddr, I hit the WARN_ON(pid !=
pi_state->owner->pid) as expected, and the userspace testcase deadlocks.

I need to try and better understand what's happening to hang userspace.  In the
meantime I thought I'd share what I'm working with atm.  This is a complete HACK
and is ugly, non-modular, etc etc.  However, it currently works.  It would explode
in a most impressive fashion should we happen to fault.  So the remaining questions
are:

o Why does userspace deadlock if we leave the uval updating to the new owner
  waking up in futex_wait_requeue_pi()?

o If we have to handle a fault in futex_requeue(), how can we best cleanup the
  proxy lock acquisition and get things back into a sane state.  We faulted, so
  determinism is out the window anyway, we just need to recover gracefully.

Thoughts welcome.

Darren Hart

Index: 2.6.31-rc4-rt1/kernel/futex.c
===================================================================
--- 2.6.31-rc4-rt1.orig/kernel/futex.c	2009-08-05 13:27:11.000000000 -0700
+++ 2.6.31-rc4-rt1/kernel/futex.c	2009-08-05 16:19:08.000000000 -0700
@@ -1174,7 +1174,7 @@ static int futex_requeue(u32 __user *uad
 	struct plist_head *head1;
 	struct futex_q *this, *next;
 	struct task_struct *wake_list = &init_task;
-	u32 curval2;
+	u32 curval, curval2, newval;
 
 	if (requeue_pi) {
 		/*
@@ -1329,6 +1329,45 @@ retry_private:
 			if (ret == 1) {
 				/* We got the lock. */
 				requeue_pi_wake_futex(this, &key2, hb2);
+
+				while (1) {
+					newval = (curval2 & FUTEX_OWNER_DIED) |
+						 task_pid_vnr(this->task) |
+						 FUTEX_WAITERS;
+
+					curval = cmpxchg_futex_value_locked(uaddr2, curval2, newval);
+
+					if (curval == -EFAULT)
+						goto handle_fault;
+					if (curval == curval2)
+						break;
+					curval2 = curval;
+				}
+				/*
+				 * Fixup the pi_state owner to ensure pi
+				 * boosting happens in the event of a race
+				 * between us dropping the lock and the new
+				 * owner waking and grabbing the lock.  Since
+				 * the lock is contended, we leave it to the new
+				 * owner to fix up the userspace value since we
+				 * can't handle a fault here gracefully.
+				 */
+				/*
+				 * FIXME: we duplicate this in 3 places now.  If
+				 * we keep this solution, let's refactor this
+				 * pi_state reparenting thing.
+				 */
+				BUG_ON(!pi_state->owner);
+				atomic_spin_lock_irq(&pi_state->owner->pi_lock);
+				WARN_ON(list_empty(&pi_state->list));
+				list_del_init(&pi_state->list);
+				atomic_spin_unlock_irq(&pi_state->owner->pi_lock);
+
+				atomic_spin_lock_irq(&this->task->pi_lock);
+				WARN_ON(!list_empty(&pi_state->list));
+				list_add(&pi_state->list, &this->task->pi_state_list);
+				pi_state->owner = this->task;
+				atomic_spin_unlock_irq(&this->task->pi_lock);
 				continue;
 			} else if (ret) {
 				/* -EDEADLK */
@@ -1363,6 +1402,17 @@ out:
 	if (pi_state != NULL)
 		free_pi_state(pi_state);
 	return ret ? ret : task_count;
+
+handle_fault:
+	/*
+	 * We can't handle a fault, so instead, give up the lock and requeue the
+	 * would-have-been-new-owner instead.
+	 */
+	printk("ERROR: faulted during futex_requeue loop, trying to fixup pi_state owner\n");
+	/* FIXME: write the unlock and requeue code ... */
+	ret = -EFAULT;
+	goto out_unlock;
+
 }
 
 /* The key must be already stored in q->key. */
-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ