linux-kernel - Re: [PATCH v5 2/4] futex: Larger hash table

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140111073724.GA10038@linux.vnet.ibm.com>
Date:	Fri, 10 Jan 2014 23:37:24 -0800
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Davidlohr Bueso <davidlohr@...com>
Cc:	linux-kernel@...r.kernel.org, mingo@...nel.org,
	dvhart@...ux.intel.com, peterz@...radead.org, tglx@...utronix.de,
	efault@....de, jeffm@...e.com, torvalds@...ux-foundation.org,
	jason.low2@...com, Waiman.Long@...com, tom.vaden@...com,
	scott.norton@...com, aswin@...com
Subject: Re: [PATCH v5 2/4] futex: Larger hash table

On Thu, Jan 02, 2014 at 07:05:18AM -0800, Davidlohr Bueso wrote:
> From: Davidlohr Bueso <davidlohr@...com>
> 
> Currently, the futex global hash table suffers from it's fixed, smallish
> (for today's standards) size of 256 entries, as well as its lack of NUMA
> awareness. Large systems, using many futexes, can be prone to high amounts
> of collisions; where these futexes hash to the same bucket and lead to
> extra contention on the same hb->lock. Furthermore, cacheline bouncing is a
> reality when we have multiple hb->locks residing on the same cacheline and
> different futexes hash to adjacent buckets.
> 
> This patch keeps the current static size of 16 entries for small systems,
> or otherwise, 256 * ncpus (or larger as we need to round the number to a
> power of 2). Note that this number of CPUs accounts for all CPUs that can
> ever be available in the system, taking into consideration things like
> hotpluging. While we do impose extra overhead at bootup by making the hash
> table larger, this is a one time thing, and does not shadow the benefits
> of this patch.
> 
> Furthermore, as suggested by tglx, by cache aligning the hash buckets we can
> avoid access across cacheline boundaries and also avoid massive cache line
> bouncing if multiple cpus are hammering away at different hash buckets which
> happen to reside in the same cache line.
> 
> Also, similar to other core kernel components (pid, dcache, tcp), by using
> alloc_large_system_hash() we benefit from its NUMA awareness and thus the
> table is distributed among the nodes instead of in a single one.
> 
> For a custom microbenchmark that pounds on the uaddr hashing -- making the wait
> path fail at futex_wait_setup() returning -EWOULDBLOCK for large amounts of
> futexes, we can see the following benefits on a 80-core, 8-socket 1Tb server:
> 
> +---------+--------------------+------------------------+-----------------------+-------------------------------+
> | threads | baseline (ops/sec) | aligned-only (ops/sec) | large table (ops/sec) | large table+aligned (ops/sec) |
> +---------+--------------------+------------------------+-----------------------+-------------------------------+
> |     512 |		 32426 | 50531  (+55.8%)	| 255274  (+687.2%)	| 292553  (+802.2%)		|
> |     256 |		 65360 | 99588  (+52.3%)	| 443563  (+578.6%)	| 508088  (+677.3%)		|
> |     128 |		125635 | 200075 (+59.2%)	| 742613  (+491.1%)	| 835452  (+564.9%)		|
> |      80 |		193559 | 323425 (+67.1%)	| 1028147 (+431.1%)	| 1130304 (+483.9%)		|
> |      64 |		247667 | 443740 (+79.1%)	| 997300  (+302.6%)	| 1145494 (+362.5%)		|
> |      32 |		628412 | 721401 (+14.7%)	| 965996  (+53.7%)	| 1122115 (+78.5%)		|
> +---------+--------------------+------------------------+-----------------------+-------------------------------+
> 
> Cc: Ingo Molnar <mingo@...nel.org>
> Reviewed-by: Darren Hart <dvhart@...ux.intel.com>
> Acked-by: Peter Zijlstra <peterz@...radead.org>
> Cc: Thomas Gleixner <tglx@...utronix.de>
> Cc: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
> Cc: Mike Galbraith <efault@....de>
> Cc: Jeff Mahoney <jeffm@...e.com>
> Cc: Linus Torvalds <torvalds@...ux-foundation.org>
> Cc: Scott Norton <scott.norton@...com>
> Cc: Tom Vaden <tom.vaden@...com>
> Cc: Aswin Chandramouleeswaran <aswin@...com>
> Reviewed-by: Waiman Long <Waiman.Long@...com>
> Reviewed-and-tested-by: Jason Low <jason.low2@...com>
> Signed-off-by: Davidlohr Bueso <davidlohr@...com>

Reviewed-by: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>

> ---
>  kernel/futex.c | 26 +++++++++++++++++++-------
>  1 file changed, 19 insertions(+), 7 deletions(-)
> 
> diff --git a/kernel/futex.c b/kernel/futex.c
> index 085f5fa..577481d 100644
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -63,6 +63,7 @@
>  #include <linux/sched/rt.h>
>  #include <linux/hugetlb.h>
>  #include <linux/freezer.h>
> +#include <linux/bootmem.h>
> 
>  #include <asm/futex.h>
> 
> @@ -70,8 +71,6 @@
> 
>  int __read_mostly futex_cmpxchg_enabled;
> 
> -#define FUTEX_HASHBITS (CONFIG_BASE_SMALL ? 4 : 8)
> -
>  /*
>   * Futex flags used to encode options to functions and preserve them across
>   * restarts.
> @@ -149,9 +148,11 @@ static const struct futex_q futex_q_init = {
>  struct futex_hash_bucket {
>  	spinlock_t lock;
>  	struct plist_head chain;
> -};
> +} ____cacheline_aligned_in_smp;
> 
> -static struct futex_hash_bucket futex_queues[1<<FUTEX_HASHBITS];
> +static unsigned long __read_mostly futex_hashsize;
> +
> +static struct futex_hash_bucket *futex_queues;
> 
>  /*
>   * We hash on the keys returned from get_futex_key (see below).
> @@ -161,7 +162,7 @@ static struct futex_hash_bucket *hash_futex(union futex_key *key)
>  	u32 hash = jhash2((u32*)&key->both.word,
>  			  (sizeof(key->both.word)+sizeof(key->both.ptr))/4,
>  			  key->both.offset);
> -	return &futex_queues[hash & ((1 << FUTEX_HASHBITS)-1)];
> +	return &futex_queues[hash & (futex_hashsize - 1)];
>  }
> 
>  /*
> @@ -2719,7 +2720,18 @@ SYSCALL_DEFINE6(futex, u32 __user *, uaddr, int, op, u32, val,
>  static int __init futex_init(void)
>  {
>  	u32 curval;
> -	int i;
> +	unsigned long i;
> +
> +#if CONFIG_BASE_SMALL
> +	futex_hashsize = 16;
> +#else
> +	futex_hashsize = roundup_pow_of_two(256 * num_possible_cpus());
> +#endif
> +
> +	futex_queues = alloc_large_system_hash("futex", sizeof(*futex_queues),
> +					       futex_hashsize, 0,
> +					       futex_hashsize < 256 ? HASH_SMALL : 0,
> +					       NULL, NULL, futex_hashsize, futex_hashsize);
> 
>  	/*
>  	 * This will fail and we want it. Some arch implementations do
> @@ -2734,7 +2746,7 @@ static int __init futex_init(void)
>  	if (cmpxchg_futex_value_locked(&curval, NULL, 0, 0) == -EFAULT)
>  		futex_cmpxchg_enabled = 1;
> 
> -	for (i = 0; i < ARRAY_SIZE(futex_queues); i++) {
> +	for (i = 0; i < futex_hashsize; i++) {
>  		plist_head_init(&futex_queues[i].chain);
>  		spin_lock_init(&futex_queues[i].lock);
>  	}
> -- 
> 1.8.1.4
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/