linux-kernel - Re: [PATCH/RFC] mutex: Fix optimistic spinning vs. BKL

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1273601980.1810.59.camel@laptop>
Date:	Tue, 11 May 2010 20:19:40 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Arnd Bergmann <arnd@...db.de>, Ingo Molnar <mingo@...e.hu>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Tony Breeds <tonyb@....ibm.com>
Subject: Re: [PATCH/RFC] mutex: Fix optimistic spinning vs. BKL

On Tue, 2010-05-11 at 11:06 -0700, Linus Torvalds wrote:
> 
> On Mon, 10 May 2010, Peter Zijlstra wrote:
> > 
> > As to the 2 jiffy spin timeout, I guess we should add a lockdep warning
> > for that, because anybody holding a mutex for longer than 2 jiffies and
> > not sleeping does need fixing anyway.
> 
> I really hate the jiffies thing, but looking at the optimistic spinning, I 
> do wonder about two things..
> 
> First - we check "need_resched()" only if owner is NULL. That sounds 
> wrong. If we need to reschedule, we need to stop spinning _regardless_ of 
> whether the owner may have been preempted before setting the owner field.

There is a second need_resched() in the inner spin loop in
kernel/sched.c:mutex_spin_on_owner().

> Second: we allow "owner" to change, and we'll continue spinning. This is 
> how you can end up spinning for a long time - not because anybody holds 
> the mutex for longer than 2 jiffies, but because a lot of other threads 
> _together_ hold the mutex for longer than 2 jiffies.

Granted.

> Now, I think we do want some limited "continue spinning even if somebody 
> else ended up getting it instead", but I think we should at least limit 
> it. Otherwise we end up being potentially rather unfair, since we don't 
> have any fair queueing logic for the optimistic spinning phase.
> 
> Now, we could just count the number of times "owner" has changed, and I 
> suspect that would be sufficient. Now, that trivial counting sceme would 
> fail if "owner" stays the same (ie the same process re-takes the lock over 
> and over again, possibly due to hot cacheline things being very unfair 
> to the person who already owns it), but quite frankly, I don't think we 
> can get into that kind of situation. 
> 
> Why? Mutexes may end up being very heavily contended, but they can't be 
> contended by just _one_ thread. So if we're really in a starvation issue, 
> the thread that is waiting _will_ see multiple different owners.
> 
> So once you have seen X number of other owners, you just say "screw it, 
> this spinning thing isn't working for me, I'll go to the sleeping case".

Right, so basically count the number of mutex_spin_on_owner() calls and
bail when >N.

> Of course, it's quite possible that as long as "need_resched()" isn't set, 
> spinning really _is_ the right thing to do. Maybe it causes horrible CPU 
> load on some odd "everybody synchronize" loads, but maybe that really is 
> the best we can do.

Ben's argument was that spinning for a long time wrecks power usage.

That said, I'd still like a counter/event/warning to see if someone
actually manages to hold onto a mutex for long (2 jiffies) without
scheduling at all. If we ever run into something like that, that needs
to get fixed regardless.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/