[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.2.00.0901081140550.3283@localhost.localdomain>
Date: Thu, 8 Jan 2009 11:54:23 -0800 (PST)
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Chris Mason <chris.mason@...cle.com>
cc: Steven Rostedt <rostedt@...dmis.org>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...e.hu>, paulmck@...ux.vnet.ibm.com,
Gregory Haskins <ghaskins@...ell.com>,
Matthew Wilcox <matthew@....cx>,
Andi Kleen <andi@...stfloor.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
linux-btrfs <linux-btrfs@...r.kernel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Nick Piggin <npiggin@...e.de>,
Peter Morreale <pmorreale@...ell.com>,
Sven Dietrich <SDietrich@...ell.com>
Subject: Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning
On Thu, 8 Jan 2009, Chris Mason wrote:
>
> The patch below isn't quite what Linus suggested, but it is working here
> at least. In every test I've tried so far, this is faster than the ugly
> btrfs spin.
Sadly, I don't think it's really working.
I was pretty sure that adding the unlocked loop should provably not change
the mutex lock semantics. Why? Because it's just basically equivalent to
just doing the mutex_trylock() without really changing anything really
fundamental in the mutex logic.
And that argument is sadly totally bogus.
The thing is, we used to have this guarantee that any contention would
always go into the slowpath, and then in the slow-path we serialize using
the spinlock.
So I think the bug is still there, we just hid it better by breaking out
of the loop with that "if (need_resched())" always eventually triggering.
And it would be ok if it really is guaranteed to _eventually_ trigger, and
I guess with timeslices it eventually always will, but I suspect we could
have some serious latency spikes.
The problem? Setting "lock->count" to 0. That will mean that the next
"mutex_unlock()" will not necessarily enter the slowpath at all, and won't
necessarily wake things up like it should.
Normally we set lock->count to 0 after getting the lock, and only _inside_
the spinlock, and then we check the waiters after that. The comment says
it all:
/* set it to 0 if there are no waiters left: */
if (likely(list_empty(&lock->wait_list)))
atomic_set(&lock->count, 0);
and the spinning case violates that rule.
Now, the spinning case only sets it to 0 if we saw it set to 1, so I think
the argument can go something like:
- if it was 1, and we _have_ seen contention, then we know that at
least _one_ person that set it to 1 must have gone through the unlock
slowpath (ie it wasn't just a normal "locked increment".
- So even if _we_ (in the spinning part of stealing that lock) didn't
wake the waiter up, the slowpath wakeup case (that did _not_ wake
us up, since we were spinning and hadn't added ourselves to the wait
list) must have done so.
So maybe it's all really really safe, and we're still guaranteed to have
as many wakeups as we had go-to-sleeps. But I have to admit that my brain
hurts a bit from worrying about this.
Sleeping mutexes are not ever simple.
Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists