linux-kernel - Re: Mutex vs semaphores scheduler bug

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1255359207.10420.31.camel@twins>
Date:	Mon, 12 Oct 2009 16:53:27 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Török Edwin <edwin@...mav.net>
Cc:	Ingo Molnar <mingo@...e.hu>,
	Linux Kernel <linux-kernel@...r.kernel.org>,
	aCaB <acab@...mav.net>, David Howells <dhowells@...hat.com>,
	Nick Piggin <npiggin@...e.de>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: Mutex vs semaphores scheduler bug

On Sat, 2009-10-10 at 17:57 +0300, Török Edwin wrote:
> If a semaphore (such as mmap_sem) is heavily congested, then using a
> userspace mutex makes the program faster.
> 
> For example using a mutex around *anonymous* mmaps, speeds it up
> significantly (~80% on this microbenchmark,
> ~15% on real applications). Such workarounds shouldn't  be necessary for
> userspace applications, the kernel should
> by default use the most efficient implementation for locks.

Should, yes, does, no.

> However when using a mutex the number of context switches is SMALLER by
> 40-60%.

That matches the problem, see below.

> I think its a bug in the scheduler, it scheduler the mutex case much
> better. 

It's not, the scheduler doesn't know about mutexes/futexes/rwsems.

> Maybe because userspace also spins a bit before actually calling
> futex().

Nope, if we would ever spin, it would be in the kernel after calling
FUTEX_LOCK (which currently doesn't exist). glibc shouldn't do any
spinning on its own (if it does, I have yet another reason to try and
supplant the glibc futex code).

> I think its important to optimize the mmap_sem semaphore

It is.

The problem appears to be that rwsem doesn't allow lock-stealing, and
very strictly maintains FIFO order on contention. This results in extra
schedules and reduced performance as you noticed.

What happens is that when we release a contended rwsem we assign it to
the next waiter, if before that waiter gets ran, another (running) tasks
comes along and tries to acquire the lock, that gets put to sleep, even
though it could possibly get to acquire it (and the woken waiter would
detect failure and go back to sleep).

So what I think we need to do is have a look at all this lib/rwsem.c
slowpath code and hack in lock stealing.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/