linux-kernel - Re: [RFC PATCH v2 3/5] futex: Throughput-optimized (TO) futexes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.20.1609222228230.5640@nanos>
Date:   Thu, 22 Sep 2016 22:38:41 +0200 (CEST)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Waiman Long <waiman.long@....com>
cc:     Davidlohr Bueso <dave@...olabs.net>,
        Peter Zijlstra <peterz@...radead.org>,
        Mike Galbraith <umgwanakikbuti@...il.com>,
        Ingo Molnar <mingo@...nel.org>,
        Jonathan Corbet <corbet@....net>, linux-kernel@...r.kernel.org,
        linux-doc@...r.kernel.org, Jason Low <jason.low2@....com>,
        Scott J Norton <scott.norton@....com>,
        Douglas Hatch <doug.hatch@....com>
Subject: Re: [RFC PATCH v2 3/5] futex: Throughput-optimized (TO) futexes

On Thu, 22 Sep 2016, Waiman Long wrote:
> BTW, my initial attempt for the new futex was to use the same workflow as the
> PI futexes, but use mutex which has optimistic spinning instead of rt_mutex.
> That version can double the throughput compared with PI futexes but still far
> short of what can be achieved with wait-wake futex. Looking at the performance
> figures from the patch:
> 
>                 wait-wake futex     PI futex        TO futex
>                 ---------------     --------        --------
> max time            3.49s            50.91s          2.65s
> min time            3.24s            50.84s          0.07s
> average time        3.41s            50.90s          1.84s
> sys time          7m22.4s            55.73s        2m32.9s

That's really interesting. Do you have any explanation for this massive
system time differences?

> lock count       3,090,294          9,999,813       698,318
> unlock count     3,268,896          9,999,814           134
> 
> The problem with a PI futexes like version is that almost all the lock/unlock
> operations were done in the kernel which added overhead and latency. Now
> looking at the numbers for the TO futexes, less than 1/10 of the lock
> operations were done in the kernel, the number of unlock was insignificant.
> Locking was done mostly by lock stealing. This is where most of the
> performance benefit comes from, not optimistic spinning.

How does the lock latency distribution of all this look like and how fair
is the whole thing?

> This is also the reason that a lock handoff mechanism is implemented to
> prevent lock starvation which is likely to happen without one.

Where is that lock handoff mechanism?

Thanks,

	tglx