linux-kernel - Re: [PATCH] futex: add FUTEX_SET

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4B1442B0.6000402@us.ibm.com>
Date:	Mon, 30 Nov 2009 14:09:52 -0800
From:	Darren Hart <dvhltc@...ibm.com>
To:	Peter Zijlstra <peterz@...radead.org>
CC:	Michel Lespinasse <walken@...gle.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...e.hu>, linux-kernel@...r.kernel.org,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH] futex: add FUTEX_SET_WAIT operation

Peter Zijlstra wrote:

Hey Peter,

Some thoughts on adaptive futexes ...

> Subject: futex: implement adaptive spin
> From: Peter Zijlstra <a.p.zijlstra@...llo.nl>
> Date: Tue Jan 20 14:40:36 CET 2009
> 
> Implement kernel side adaptive spining for futexes.
> 
> This is implemented as a new futex op: FUTEX_WAIT_ADAPTIVE, because it
> requires the futex lock field to contain a TID and regular futexes do
> not have that restriction.
> 
> FUTEX_WAIT_ADAPTIVE will spin while the lock owner is running (on a 
> different cpu) or until the task gets preempted. After that it behaves
> like FUTEX_WAIT.
> 
> The spin loop tries to acquire the lock by cmpxchg(lock, 0, tid) == tid
> on the lower 30 bits (TID_MASK) of the lock field -- leaving the
> WAITERS and OWNER_DIED flags in tact.
> 
> NOTE: we could fold mutex_spin_on_owner() and futex_spin_on_owner() by
> adding a lock_owner function argument.
> 
>   void lock_spin_on_owner(struct thread_info *(*lock_owner)(void *lock),
>                           void *lock, struct thread_info *owner);
> 
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@...llo.nl>
> ...

> Index: linux-2.6/kernel/futex.c
> ===================================================================
> --- linux-2.6.orig/kernel/futex.c
> +++ linux-2.6/kernel/futex.c
> @@ -1158,10 +1158,40 @@ handle_fault:
>   */
>  #define FLAGS_SHARED		0x01
>  #define FLAGS_CLOCKRT		0x02
> +#define FLAGS_ADAPTIVE		0x03
> 
>  static long futex_wait_restart(struct restart_block *restart);
> 
> -static int futex_wait(u32 __user *uaddr, int fshared,
> +#ifdef CONFIG_SMP
> +struct thread_info *futex_owner(u32 __user *uaddr)
> +{
> +	struct thread_info *ti = NULL;
> +	struct task_struct *p;
> +	pid_t pid;
> +	u32 uval;
> +
> +	if (get_futex_value_locked(&uval, uaddr))
> +		return NULL;

Just give up if it would cause a fault?

> +
> +	pid = uval & FUTEX_TID_MASK;
> +	rcu_read_lock();
> +	p = find_taks_by_vpid(pid);

I'm impressed that you can create such a solid patch without compiling it!

> +	if (p) {
> +		const struct cred *cred, *pcred;
> +
> +		cread = current_cred();
> +		pcred = __task_cred(p);
> +		if (cred->euid == pcred->euid ||
> +		    cred->euid == pcred->uid)
> +			ti = task_thread_info(p);
> +	}
> +	rcu_read_unlock();
> +
> +	return ti;
> +}
> +#endif
> +
> +static int futex_wait(u32 __user *uaddr, int flags,
>  		      u32 val, ktime_t *abs_time, u32 bitset, int clockrt)
>  {
>  	struct task_struct *curr = current;
> @@ -1176,11 +1206,43 @@ static int futex_wait(u32 __user *uaddr,
>  	if (!bitset)
>  		return -EINVAL;
> 
> +#ifdef CONFIG_SMP
> +	if (!futex_cmpxchg_enabled || !(flags & FLAGS_ADAPTIVE))
> +		goto skip_adaptive;
> +
> +	preempt_disable();
> +	for (;;) {
> +		struct thread_info *owner;
> +		u32 curval = 0, newval = task_pid_vnr(curr);

Do we need to lookup newval every iteration?

> +
> +		owner = futex_owner(uaddr);
> +		if (owner && futex_spin_on_owner(uaddr, owner))
> +			break;
> +
> +		if (get_futex_value_locked(&uval, uaddr))
> +			break;
> +
> +		curval |= uval & ~FUTEX_TID_MASK;
> +		newval |= uval & ~FUTEX_TID_MASK;
> +
> +		if (cmpxchg_futex_value_locked(uaddr, curval, newval)
> +				== newval)

"== curval" isn't it? futex_atomic_cmpxchg_inatomic() returns the 
oldval, not the newval.

> +			return 0;
> +
> +		if (!owner && (need_resched() || rt_task(curr)))
> +			break;

Hrm... why go through the loop at all for an rt_task if we bail on the 
first iteration?

> +
> +		cpu_relax();
> +	}
> +	preempt_enable();
> +skip_adaptive:

-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/