lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Fri, 18 Jan 2019 16:30:25 -0500
From:   Waiman Long <longman@...hat.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     James Morse <james.morse@....com>,
        Zhenzhong Duan <zhenzhong.duan@...cle.com>,
        LKML <linux-kernel@...r.kernel.org>,
        SRINIVAS <srinivas.eeda@...cle.com>,
        Borislav Petkov <bp@...en8.de>,
        Steven Rostedt <rostedt@...dmis.org>
Subject: Re: Question about qspinlock nest

On 01/18/2019 03:06 PM, Peter Zijlstra wrote:
> On Fri, Jan 18, 2019 at 09:50:12AM -0500, Waiman Long wrote:
>> On 01/18/2019 05:02 AM, Peter Zijlstra wrote:
>>>> e.g. We can't take an SError during the SError handler.
>>>>
>>>> But we can take this SError/NMI on another CPU while the first one is still
>>>> running the handler.
>>>>
>>>> These multiple NMIlike notifications mean having multiple locks/fixmap-slots,
>>>> one per notification. This is where the qspinlock node limit comes in, as we
>>>> could have more than 4 contexts.
>>> Right; so Waiman was going to do a patch that reverts to test-and-set or
>>> something along those lines once we hit the queue limit, which seems
>>> like a good way out. Actually hitting that nesting level should be
>>> exceedingly rare.
>> Yes, I am working on a patch to support arbitrary levels of nesting. It
>> is easy for PV qspinlock as lock stealing is supported.
>>
>> For native qspinlock, we cannot do lock stealing without incurring a
>> certain amount of overhead in the regular slowpath code. It was up to
>> 10% in my own testing. So I am exploring an alternative that can do the
>> job without incurring any noticeable performance degradation in the
>> slowpath. I ran into a race condition which I am still trying to find
>> out where that comes from. Hopefully, I will have something to post next
>> week.
> Where does the overhead come from? Surely that's not just checking that
> bound?

It is not about checking bound, it is about how to acquire the lock
without using an MCS node. The overhead comes from using atomic
instruction to acquire the lock instead of non-atomic one in order to
allow lock stealing.

Cheers,
Longman

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ