lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1387380741.2797.27.camel@buesod1.americas.hpqcorp.net>
Date:	Wed, 18 Dec 2013 07:32:21 -0800
From:	Davidlohr Bueso <davidlohr@...com>
To:	linux-kernel@...r.kernel.org
Cc:	mingo@...nel.org, dvhart@...ux.intel.com, peterz@...radead.org,
	tglx@...utronix.de, paulmck@...ux.vnet.ibm.com, efault@....de,
	jeffm@...e.com, torvalds@...ux-foundation.org, scott.norton@...com,
	tom.vaden@...com, aswin@...com, Waiman.Long@...com,
	jason.low2@...com
Subject: Re: [PATCH v2 0/4] futex: Wakeup optimizations

ping? 

If no one has any objections, could this patchset be picked up?

Thanks,
Davidlohr

On Tue, 2013-12-03 at 01:45 -0800, Davidlohr Bueso wrote:
> Changes from v1 [https://lkml.org/lkml/2013/11/22/525]:
>  - Removed patch "futex: Check for pi futex_q only once".
> 
>  - Cleaned up ifdefs for larger hash table.
> 
>  - Added a doc patch from tglx that describes the futex 
>    ordering guarantees.
> 
>  - Improved the lockless plist check for the wake calls.
>    Based on the community feedback, the necessary abstractions
>    and barriers are added to maintain ordering guarantees.
>    Code documentation is also updated.
> 
>  - Removed patch "sched,futex: Provide delayed wakeup list".
>    Based on feedback from PeterZ, I will look into this as
>    a separate issue once the other patches are settled.
>  
> 
> We have been dealing with a customer database workload on large
> 12Tb, 240 core 16 socket NUMA system that exhibits high amounts 
> of contention on some of the locks that serialize internal futex 
> data structures. This workload specially suffers in the wakeup 
> paths, where waiting on the corresponding hb->lock can account for 
> up to ~60% of the time. The result of such calls can mostly be 
> classified as (i) nothing to wake up and (ii) wakeup large amount 
> of tasks.
> 
> Before these patches are applied, we can see this pathological behavior:
> 
>  37.12%  826174  xxx  [kernel.kallsyms] [k] _raw_spin_lock
>             --- _raw_spin_lock
>              |
>              |--97.14%-- futex_wake
>              |          do_futex
>              |          sys_futex
>              |          system_call_fastpath
>              |          |
>              |          |--99.70%-- 0x7f383fbdea1f
>              |          |           yyy
> 
>  43.71%  762296  xxx  [kernel.kallsyms] [k] _raw_spin_lock
>             --- _raw_spin_lock
>              |
>              |--53.74%-- futex_wake
>              |          do_futex
>              |          sys_futex
>              |          system_call_fastpath
>              |          |
>              |          |--99.40%-- 0x7fe7d44a4c05
>              |          |           zzz
>              |--45.90%-- futex_wait_setup
>              |          futex_wait
>              |          do_futex
>              |          sys_futex
>              |          system_call_fastpath
>              |          0x7fe7ba315789
>              |          syscall
> 
> 
> With these patches, contention is practically non existent:
> 
>  0.10%     49   xxx  [kernel.kallsyms]   [k] _raw_spin_lock
>                --- _raw_spin_lock
>                 |
>                 |--76.06%-- futex_wait_setup
>                 |          futex_wait
>                 |          do_futex
>                 |          sys_futex
>                 |          system_call_fastpath
>                 |          |
>                 |          |--99.90%-- 0x7f3165e63789
>                 |          |          syscall|
>                            ...
>                 |--6.27%-- futex_wake
>                 |          do_futex
>                 |          sys_futex
>                 |          system_call_fastpath
>                 |          |
>                 |          |--54.56%-- 0x7f317fff2c05
>                 ...
> 
> Patch 1 is a cleanup.
> 
> Patch 2 addresses the well known issue of the global hash table.
> By creating a larger and NUMA aware table, we can reduce the false
> sharing and collisions, thus reducing the chance of different futexes 
> using hb->lock.
> 
> Patch 3 documents the futex ordering guarantees.
> 
> Patch 4 reduces contention on the corresponding hb->lock by not trying to
> acquire it if there are no blocked tasks in the waitqueue.
> This particularly deals with point (i) above, where we see that it is not
> uncommon for up to 90% of wakeup calls end up returning 0, indicating that no
> tasks were woken.
> 
> This patchset has also been tested on smaller systems for a variety of
> benchmarks, including java workloads, kernel builds and custom bang-the-hell-out-of
> hb locks programs. So far, no functional or performance regressions have been seen.
> Furthermore, no issues were found when running the different tests in the futextest 
> suite: http://git.kernel.org/cgit/linux/kernel/git/dvhart/futextest.git/
> 
> This patchset applies on top of Linus' tree as of v3.13-rc2 (2e7babfa).
> 
> Special thanks to Scott Norton, Tom Vanden, Mark Ray and Aswin Chandramouleeswaran
> for help presenting, debugging and analyzing the data.
> 
>   futex: Misc cleanups
>   futex: Larger hash table
>   futex: Document ordering guarantees
>   futex: Avoid taking hb lock if nothing to wakeup
> 
>  kernel/futex.c | 230 ++++++++++++++++++++++++++++++++++++++++++++++++---------
>  1 file changed, 194 insertions(+), 36 deletions(-)
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ