lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 23 Sep 2016 15:02:00 +0200 (CEST)
From:   Thomas Gleixner <tglx@...utronix.de>
To:     Waiman Long <waiman.long@....com>
cc:     Davidlohr Bueso <dave@...olabs.net>,
        Peter Zijlstra <peterz@...radead.org>,
        Mike Galbraith <umgwanakikbuti@...il.com>,
        Ingo Molnar <mingo@...nel.org>,
        Jonathan Corbet <corbet@....net>, linux-kernel@...r.kernel.org,
        linux-doc@...r.kernel.org, Jason Low <jason.low2@....com>,
        Scott J Norton <scott.norton@....com>,
        Douglas Hatch <doug.hatch@....com>
Subject: Re: [RFC PATCH v2 3/5] futex: Throughput-optimized (TO) futexes

On Thu, 22 Sep 2016, Waiman Long wrote:
> On 09/22/2016 04:38 PM, Thomas Gleixner wrote:
> > >                  wait-wake futex     PI futex        TO futex
> > >                  ---------------     --------        --------
> > > max time            3.49s            50.91s          2.65s
> > > min time            3.24s            50.84s          0.07s
> > > average time        3.41s            50.90s          1.84s
> > > sys time          7m22.4s            55.73s        2m32.9s
> > That's really interesting. Do you have any explanation for this massive
> > system time differences?
> 
> For the wait-wake futexes, the waiters will be kicked out with EAGAIN without
> sleeping as long as the futex value changes before they acquire the hash
> bucket spinlock. These waiters will attempt to acquire the lock again in the
> user space. If that fails, they will call FUTEX_WAIT again.
> 
> In the above test case, the critical section is pretty short and about 4/5 of
> the FUTEX_WAIT calls resulted in an EAGAIN error. So there are a lot more
> user-space/kernel transitions than the TO futexes. I think the difference in
> the number of FUTEX_WAIT calls vs. the FUTEX_LOCK_TO calls causes the
> difference between their sys times.

Ok.

> As for the PI futex, it is a strictly sleeping lock and so won't use that
> much sys time.

That's clear.

> > > Locking was done mostly by lock stealing. This is where most of the
> > > performance benefit comes from, not optimistic spinning.
> > How does the lock latency distribution of all this look like and how fair
> > is the whole thing?
> 
> The TO futexes are unfair as can be seen from the min/max thread times listed
> above. It took the fastest thread 0.07s to complete all the locking
> operations, whereas the slowest one needed 2.65s. However, the situation
> reverses when I changed the critical section to a 1us sleep. In this case,

1us sleep is going to add another syscall and therefor scheduling, so what?

Or did you just extend the critical section busy time?

> there will be no optimistic spinning. The performance results for 100k locking
> operations were listed below.
> 
>                 wait-wake futex     PI futex        TO futex
>                 ---------------     --------        --------
> max time            0.06s             9.32s          4.76s

      		      ^^^^ ????			

> min time            5.59s             9.36s          5.62s
> average time        3.25s             9.35s          5.41s
> 
> In this case, the TO futexes are fairer but perform worse than the wait-wake
> futexes. That is because the lock handoff mechanism limit the amount of lock
> stealing in the TO futexes while the wait-wake futexes have no such
> restriction. When I disabled  lock handoff, the TO futexes would then perform
> similar to the wait-wake futexes.

So the benefit of these new fangled futexes is only there for extreme short
critical sections and a gazillion of threads fighting for the same futex,
right?

I really wonder how the average programmer should pick the right flavour,
not to talk about any useful decision for something like glibc to pick the
proper one.

Thanks,

	tglx

 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ