[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190410184429.GX4038@hirez.programming.kicks-ass.net>
Date: Wed, 10 Apr 2019 20:44:29 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Waiman Long <longman@...hat.com>
Cc: Ingo Molnar <mingo@...hat.com>, Will Deacon <will.deacon@....com>,
Thomas Gleixner <tglx@...utronix.de>,
linux-kernel@...r.kernel.org, x86@...nel.org,
Davidlohr Bueso <dave@...olabs.net>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Tim Chen <tim.c.chen@...ux.intel.com>
Subject: Re: [PATCH-tip v2 02/12] locking/rwsem: Implement lock handoff to
prevent lock starvation
On Fri, Apr 05, 2019 at 03:21:05PM -0400, Waiman Long wrote:
> Because of writer lock stealing, it is possible that a constant
> stream of incoming writers will cause a waiting writer or reader to
> wait indefinitely leading to lock starvation.
>
> The mutex code has a lock handoff mechanism to prevent lock starvation.
> This patch implements a similar lock handoff mechanism to disable
> lock stealing and force lock handoff to the first waiter in the queue
> after at least a 5ms waiting period. The waiting period is used to
> avoid discouraging lock stealing too much to affect performance.
I would say the handoff it not at all similar to the mutex code. It is
in fact radically different.
> @@ -131,6 +138,15 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
> adjustment = RWSEM_READER_BIAS;
> oldcount = atomic_long_fetch_add(adjustment, &sem->count);
> if (unlikely(oldcount & RWSEM_WRITER_MASK)) {
> + /*
> + * Initiate handoff to reader, if applicable.
> + */
> + if (!(oldcount & RWSEM_FLAG_HANDOFF) &&
> + time_after(jiffies, waiter->timeout)) {
> + adjustment -= RWSEM_FLAG_HANDOFF;
> + lockevent_inc(rwsem_rlock_handoff);
> + }
> +
> atomic_long_sub(adjustment, &sem->count);
> return;
> }
That confuses the heck out of me...
The above seems to rely on __rwsem_mark_wake() to be fully serialized
(and it is, by ->wait_lock, but that isn't spelled out anywhere) such
that we don't get double increment of FLAG_HANDOFF.
So there is NO __rwsem_mark_wake() vs __wesem_mark_wake() race like:
CPU0 CPU1
oldcount = atomic_long_fetch_add(adjustment, &sem->count)
oldcount = atomic_long_fetch_add(adjustment, &sem->count)
if (!(oldcount & HANDOFF))
adjustment -= HANDOFF;
if (!(oldcount & HANDOFF))
adjustment -= HANDOFF;
atomic_long_sub(adjustment)
atomic_long_sub(adjustment)
*whoops* double negative decrement of HANDOFF (aka double increment).
However there is another site that fiddles with the HANDOFF bit, namely
__rwsem_down_write_failed_common(), and that does:
+ atomic_long_or(RWSEM_FLAG_HANDOFF, &sem->count);
_OUTSIDE_ of ->wait_lock, which would yield:
CPU0 CPU1
oldcount = atomic_long_fetch_add(adjustment, &sem->count)
atomic_long_or(HANDOFF)
if (!(oldcount & HANDOFF))
adjustment -= HANDOFF;
atomic_long_sub(adjustment)
*whoops*, incremented HANDOFF on HANDOFF.
And there's not a comment in sight that would elucidate if this is
possible or not.
Also:
+ atomic_long_or(RWSEM_FLAG_HANDOFF, &sem->count);
+ first++;
+
+ /*
+ * Make sure the handoff bit is seen by
+ * others before proceeding.
+ */
+ smp_mb__after_atomic();
That comment is utter nonsense. smp_mb() doesn't (and cannot) 'make
visible'. There needs to be order between two memops on both sides.
Powered by blists - more mailing lists