lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Fri, 8 Jan 2021 10:44:04 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'Peter Zijlstra' <peterz@...radead.org>
CC:     'Linus Torvalds' <torvalds@...ux-foundation.org>,
        Al Viro <viro@...iv.linux.org.uk>,
        kernel test robot <oliver.sang@...el.com>,
        "Thomas Gleixner" <tglx@...utronix.de>,
        Ingo Molnar <mingo@...nel.org>,
        "Borislav Petkov" <bp@...en8.de>,
        LKML <linux-kernel@...r.kernel.org>,
        "lkp@...ts.01.org" <lkp@...ts.01.org>,
        kernel test robot <lkp@...el.com>,
        "Huang, Ying" <ying.huang@...el.com>,
        Feng Tang <feng.tang@...el.com>,
        "zhengjun.xing@...el.com" <zhengjun.xing@...el.com>
Subject: RE: [x86] d55564cfc2: will-it-scale.per_thread_ops -5.8% regression

From: Peter Zijlstra
> Sent: 08 January 2021 09:52
> 
> On Fri, Jan 08, 2021 at 09:37:45AM +0000, David Laight wrote:
> > The lack of spinlocks in userspace really kills you.
> 
> Glibc has them, but please don't complain about lock holder preemption
> issues if you do actually use them ;-)

Nothing that glibc can do can help.
It would need to disable interrupts - which isn't allowed in userspace.

The problem isn't that the process holding the lock gets preempted,
but that the lock hold time goes from a few instructions to ~1ms.

It is also entirely noticeable (and a problem) that the futex call
that implements cv_broadcast() gets each process to wake up the next one.
There are two issues:
1) It takes time for the cpu to come out of the sleep states.
   These happen in sequence rather than all together.
2) If the processor affinities mean that one of the threads can't
   be run immediately, then none of the later threads runs either.

I realise this is (probably) done to avoid the 'thundering herd'
on the related mutex - but this code gets nowhere near acquiring
the mutex before the delays, and the mutex is released pretty
soon after 'return to user'.

The delays are far longer than a normal system call or even a 
process switch.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ