lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 15 Mar 2007 20:13:16 +0100
From:	Eric Dumazet <dada1@...mosbay.com>
To:	Nick Piggin <nickpiggin@...oo.com.au>,
	Ulrich Drepper <drepper@...il.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Ingo Molnar <mingo@...e.hu>
Cc:	Andi Kleen <ak@...e.de>,
	Ravikiran G Thirumalai <kiran@...lex86.org>,
	"Shai Fultheim (Shai@...lex86.org)" <shai@...lex86.org>,
	pravin b shelar <pravin.shelar@...softinc.com>,
	linux-kernel@...r.kernel.org
Subject: [PATCH 1/3] FUTEX : introduce PROCESS_PRIVATE semantic

[PATCH 1/3] FUTEX : introduce PROCESS_PRIVATE semantic

This first patch introduces XXX_PRIVATE futexes operations.

When a process uses a XXX_PRIVATE futex primitive, kernel can avoid
to take a read lock on mmap_sem, to find the vma that contains the futex,
to learn if it is associated to an inode (shared) or the mm (private to 
process)

We also avoid taking a reference on the found inode or the mm.

Even if mmap_sem is a rw_semaphore, up_read()/down_read() are doing atomic
 ops on mmap_sem, dirtying cache line :
        - lot of cache line ping pongs on SMP configurations.

 mmap_sem is also extensively used by mm code (page faults, mmap()/munmap())
 Highly threaded processes might suffer from mmap_sem contention.

 mmap_sem is also used by oprofile code. Enabling oprofile hurts threaded
programs because of contention on the mmap_sem cache line.

- Using an atomic_inc()/atomic_dec() on inode ref counter or mm ref counter:
 It's also a cache line ping pong on SMP. It also increases mmap_sem hold time
 because of cache misses.

This first patch is possible because, for one process using 
PTHREAD_PROCESS_PRIVATE futexes, we only need to distinguish futexes by their 
virtual address, no matter the underlying mm storage is. The case of multiple 
virtual addresses mapped on the same physical address is just insane : "Dont 
do it on PROCESS_PRIVATE futexes, please ?"

If glibc wants to exploit this new infrastructure, it should use new
_PRIVATE futex subcommands for PTHREAD_PROCESS_PRIVATE futexes. And
be prepared to fallback on old subcommands for old kernels. Using one
global variable with the FUTEX_PRIVATE_FLAG or 0 value should be OK, so that 
only one syscall might fail.

Compatibility with old applications is preserved, they still hit the
scalability problems, but new applications can fly :)

Note : SHARED futexes can be used by old binaries *and* new binaries,
because both binaries will use the old subcommands.

Note : Vast majority of futexes should be using PROCESS_PRIVATE semantic,
as this is the default semantic. Almost all applications should benefit
of this changes (new kernel and updated libc)

Signed-off-by: Eric Dumazet <dada1@...mosbay.com>
---
 include/linux/futex.h |   12 +
 kernel/futex.c        |  273 +++++++++++++++++++++++++---------------
 2 files changed, 188 insertions(+), 97 deletions(-)

View attachment "futex_p1.patch" of type "text/plain" (20079 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ