lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 07 Apr 2010 20:25:06 -0700
From:	Darren Hart <dvhltc@...ibm.com>
To:	Thomas Gleixner <tglx@...utronix.de>
CC:	linux-kernel@...r.kernel.org,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...e.hu>,
	Eric Dumazet <eric.dumazet@...il.com>,
	"Peter W. Morreale" <pmorreale@...ell.com>,
	Rik van Riel <riel@...hat.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Gregory Haskins <ghaskins@...ell.com>,
	Sven-Thorsten Dietrich <sdietrich@...ell.com>,
	Chris Mason <chris.mason@...cle.com>,
	John Cooper <john.cooper@...rd-harmonic.com>,
	Chris Wright <chrisw@...s-sol.org>,
	Avi Kivity <avi@...hat.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: [PATCH 4/6] futex: Add FUTEX_LOCK with optional adaptive spinning

Thomas Gleixner wrote:
> On Wed, 7 Apr 2010, Darren Hart wrote:
>> Thomas Gleixner wrote:
>>> On Mon, 5 Apr 2010, Darren Hart wrote:
>>> Hmm. The order is weird. Why don't you do that simpler ?
>>>
>>> Get the uval, the tid and the thread_info pointer outside of the
>>> loop. Also task_pid_vnr(current) just needs a one time lookup.
>> Eeek. Having the owner in the loop is a good way to negate the benefits
>> of adaptive spinning by spinning forever (unlikely, but it could
>> certainly spin across multiple owners). Nice catch.
>>
>> As for the uval.... I'm not sure what you mean. You get curval below
>> inside the loop, and there is no "uval" in the my version of the code.
> 
> Well, you need a first time lookup of owner and ownertid for which you
> need the user space value (uval),
> but thinking more about it it's not
> even necessary. Just initialize ownertid to 0 so it will drop into the
> lookup code when we did not acquire the futex in the cmpxchg.

No need for ownertid at all really. The cmpxchg always tries to go from 
0 to curtid. I've pushed the futex_owner() call outside the loop for a 
one time lookup.

> 
>> As for the order, I had put the initial spin prior to the cmpxchg to
>> avoid doing too many cmpxchg's in a row as they are rather expensive.
>> However, since this is (now) the first opportunity to do try and acquire
>> the lock atomically after entering the futex syscall, I think you're
>> right, it should be the first thing in the loop.
>>
>>> change the loop to do:
>>>
>>>        for (;;) {
>>>        	   curval = cmpxchg_futex_value_locked(uaddr, 0, curtid);
>>>   	   if (!curval)
>>> 	      return 1;
>> Single return point makes instrumentation so much easier. Unless folks
>> _really_ object, I'll leave it as is until we're closer to merging.
> 
> I don't care either way. That was just example code.
> 
>>> 	   if ((curval & FUTEX_TID_MASK) != ownertid) {
>>> 	      ownertid = curval & FUTEX_TID_MASK;
>>> 	      owner = update_owner(ownertid);
>>> 	   }
>>
>> Hrm... at this point the owner has changed... so we should break and go
>> to sleep, not update the owner and start spinning again. The
>> futex_spin_on_owner() will detect this and abort, so I'm not seeing the
>> purpose of the above if() block.
> 
> Why ? If the owner has changed and the new owner is running on another
> cpu then why not spin further ?

That's an interesting question, and I'm not sure what the right answer 
is. The current approach of the adaptive spinning in the kernel is to 
spin until the owner changes or deschedules, then stop and block. The 
idea is that if you didn't get the lock before the owner changed, you 
aren't going to get it in a very short period of time (you have at least 
an entire critical section to wait through plus whatever time you've 
already spent spinning). However, blocking just so another task can spin 
doesn't really make sense either, and makes the lock less fair than it 
could otherwise be.

My goal in starting this is to provide a more intelligent mechanism than 
sched_yield() for userspace to use to determine when to spin and when to 
sleep. The current implementation allows for spinning up until the owner 
changes, deschedules, or the timeslice expires. I believe these are much 
better than spinning for some fixed number of cycles and then yield for 
some unpredictable amount of time until CFS decides to schedule you back in.

Still, the criteria for breaking the spin are something that needs more 
eyes, and more numbers before I can be confident in any approach.

> 
>>>> +		hrtimer_init_sleeper(to, current);
>>>> +		hrtimer_set_expires(&to->timer, *time);
>>>> +	}
>>>   Why setup all this _before_ trying the adaptive spin ?
>>
>> I placed the retry: label above the adaptive spin loop. This way if we wake a
>> task and the lock is "stolen" it doesn't just go right back to sleep. This
>> should aid in fairness and also performance in less contended cases. I didn't
>> think it was worth a "if (first_time_through && time)" sort of block to be
>> able to setup the timer after the spin loop.
> 
> Hmm, ok.
> 
>>> Do we really need all this code ? A simple owner->on_cpu (owner needs
>>> to be the task_struct then) would be sufficient to figure that out,
>>> wouldn't it?
>> As Peter pointed out in IRC, p->oncpu isn't generic. I'll go trolling through
>> the mutex_spin_on_owner() discussions to see if I can determine why that's the
>> case.
> 
> AFAICT p->oncpu is the correct thing to use when CONFIG_SMP=y. All it
> needs is a simple accessor function and you can keep all the futex
> cruft in futex.c where it belongs.

Noted.

Thanks,
-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ