[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <52aee7c3-b8aa-a617-c1f2-34bc99c72474@redhat.com>
Date: Thu, 18 Apr 2019 11:15:33 -0400
From: Waiman Long <longman@...hat.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Ingo Molnar <mingo@...hat.com>, Will Deacon <will.deacon@....com>,
Thomas Gleixner <tglx@...utronix.de>,
linux-kernel@...r.kernel.org, x86@...nel.org,
Davidlohr Bueso <dave@...olabs.net>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Tim Chen <tim.c.chen@...ux.intel.com>,
huang ying <huang.ying.caritas@...il.com>
Subject: Re: [PATCH v4 12/16] locking/rwsem: Enable time-based spinning on
reader-owned rwsem
On 04/18/2019 09:06 AM, Peter Zijlstra wrote:
> So I really dislike time based spinning, and we've always rejected it
> before.
>
> On Sat, Apr 13, 2019 at 01:22:55PM -0400, Waiman Long wrote:
>
>> +static inline u64 rwsem_rspin_threshold(struct rw_semaphore *sem)
>> +{
>> + long count = atomic_long_read(&sem->count);
>> + int reader_cnt = atomic_long_read(&sem->count) >> RWSEM_READER_SHIFT;
>> +
>> + if (reader_cnt > 30)
>> + reader_cnt = 30;
>> + return sched_clock() + ((count & RWSEM_FLAG_WAITERS)
>> + ? 10 * NSEC_PER_USEC + reader_cnt * NSEC_PER_USEC/2
>> + : 25 * NSEC_PER_USEC);
>> +}
> Urgh, why do you _have_ to write unreadable code :-(
I guess my code writing style is less readable to others. I will try to
write simpler code that will be more readable in the future :-)
>
> static inline u64 rwsem_rspin_threshold(struct rw_semaphore *sem)
> {
> long count = atomic_long_read(&sem->count);
> u64 delta = 25 * NSEC_PER_USEC;
>
> if (count & RWSEM_FLAG_WAITERS) {
> int readers = count >> RWSEM_READER_SHIFT;
>
> if (readers > 30)
> readers = 30;
>
> delta = (20 + readers) * NSEC_PER_USEC / 2;
> }
>
> return sched_clock() + delta;
> }
>
> I don't get it though; the number of current read-owners is independent
> of WAITERS, while the hold time does correspond to it.
>
> So why do we have that WAITERS check in there?
It is not a waiter check, it is checking the number of readers that are
holding the lock. My thinking was that in the wakeup process done by
__rwsem_mark_wake(), the wakeup is done one-by-one. So the more readers
you have, the more time it takes for the last reader to actually wake up
and run its critical section. That is the main reason for that logic.
>
>> @@ -616,6 +678,35 @@ static bool rwsem_optimistic_spin(struct rw_semaphore *sem, bool wlock)
>> if (taken)
>> break;
>>
>> + /*
>> + * Time-based reader-owned rwsem optimistic spinning
>> + */
> This relies on rwsem_spin_on_owner() not actually spinning for
> read-owned.
>
Yes, because there is no task structure to spin on.
>> + if (wlock && (owner_state == OWNER_READER)) {
>> + /*
>> + * Initialize rspin_threshold when the owner
>> + * state changes from non-reader to reader.
>> + */
>> + if (prev_owner_state != OWNER_READER) {
>> + if (!is_rwsem_spinnable(sem))
>> + break;
>> + rspin_threshold = rwsem_rspin_threshold(sem);
>> + loop = 0;
>> + }
> This seems fragile, why not to the rspin_threshold thing _once_ at the
> start of this function?
>
> This way it can be reset.
You can have a situation as follows:
Lock owner: R [W] R
So a writer comes in and get the lock before the spinner. You can then
actually spin on that writer. After that a reader come in and steal the
lock, the code above will allows us to reset the timeout period for the
new reader phase.
>> + /*
>> + * Check time threshold every 16 iterations to
>> + * avoid calling sched_clock() too frequently.
>> + * This will make the actual spinning time a
>> + * bit more than that specified in the threshold.
>> + */
>> + else if (!(++loop & 0xf) &&
>> + (sched_clock() > rspin_threshold)) {
> Why is calling sched_clock() lots a problem?
Actually I am more concern about the latency introduced by the
sched_clock() call. BTW, I haven't done any measurement myself. Do you
know how much cost the sched_clock() call is?
If the cost is relatively high, the average latency period after the
lock is free and the spinner is ready to do a trylock will increase.
Cheers,
Longman
Powered by blists - more mailing lists