[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4BBBA610.3090200@us.ibm.com>
Date: Tue, 06 Apr 2010 14:22:24 -0700
From: Darren Hart <dvhltc@...ibm.com>
To: Avi Kivity <avi@...hat.com>
CC: linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...e.hu>,
Eric Dumazet <eric.dumazet@...il.com>,
"Peter W. Morreale" <pmorreale@...ell.com>,
Rik van Riel <riel@...hat.com>,
Steven Rostedt <rostedt@...dmis.org>,
Gregory Haskins <ghaskins@...ell.com>,
Sven-Thorsten Dietrich <sdietrich@...ell.com>,
Chris Mason <chris.mason@...cle.com>,
John Cooper <john.cooper@...rd-harmonic.com>,
Chris Wright <chrisw@...s-sol.org>
Subject: Re: [PATCH V2 0/6][RFC] futex: FUTEX_LOCK with optional adaptive
spinning
Darren Hart wrote:
> Avi Kivity wrote:
>
>>> > At 10%
>>>> duty cycle you have 25 waiters behind the lock on average. I don't
>>>> think this is realistic, and it means that spinning is invoked only
>>>> rarely.
>>>
>>> Perhaps some instrumentation is in order, it seems to get invoked
>>> enough to achieve some 20% increase in lock/unlock iterations.
>>> Perhaps another metric would be of more value - such as average wait
>>> time?
>>
>> Why measure an unrealistic workload?
>
> No argument there, thus my proposal for an alternate configuration below.
>
>>>> I'd be interested in seeing runs where the average number of waiters
>>>> is 0.2, 0.5, 1, and 2, corresponding to moderate-to-bad contention.
>>>> 25 average waiters on compute bound code means the application needs
>>>> to be rewritten, no amount of mutex tweaking will help it.
>>>
>>> Perhaps something NR_CPUS threads would be of more interest?
>>
>> That seems artificial.
>
> How so? Several real world applications use one thread per CPU to
> dispatch work to, wait for events, etc.
>
>>
>>> At 10% that's about .8 and at 25% the 2 of your upper limit. I could
>>> add a few more duty-cycle points and make 25% the max. I'll kick that
>>> off and post the results... probably tomorrow, 10M iterations takes a
>>> while, but makes the results relatively stable.
>>
>> Thanks. But why not vary the number of threads as well?
>
> Absolutely, I don't disagree that all the variables should vary in order
> to get a complete picture. I'm starting with 8 - it takes several hours
> to collect the data.
While this might be of less interest after today's discussion, I
promised to share the results of a run with 8 threads with a wider
selection of lower duty-cycles. The results are very poor for adaptive
and worse for aas (multiple spinners) compared to normal FUTEX_LOCK. As
Thomas and Peter have pointed out, the implementation is sub-optimal.
Before abandoning this approach I will see if I can find the bottlenecks
and simplify the kernel side of things. My impression is that I am doing
a lot more work in the kernel, especially in the adaptive loop, than is
really necessary.
Both the 8 and 256 Thread plots can be viewed here:
http://www.kernel.org/pub/linux/kernel/people/dvhart/adaptive_futex/v4/
--
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists