linux-kernel - Re: [PATCH v9 0/5] Add NUMA-awareness to qspinlock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9f8e040a-0684-41c9-0c2e-65d2a1e14c22@redhat.com>
Date:   Thu, 23 Jan 2020 14:08:26 -0500
From:   Waiman Long <longman@...hat.com>
To:     Will Deacon <will@...nel.org>
Cc:     Lihao Liang <lihaoliang@...gle.com>,
        Alex Kogan <alex.kogan@...cle.com>, linux@...linux.org.uk,
        peterz@...radead.org, mingo@...hat.com, will.deacon@....com,
        arnd@...db.de, linux-arch@...r.kernel.org,
        linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
        tglx@...utronix.de, bp@...en8.de, hpa@...or.com, x86@...nel.org,
        guohanjun@...wei.com, jglauber@...vell.com, dave.dice@...cle.com,
        steven.sistare@...cle.com, daniel.m.jordan@...cle.com
Subject: Re: [PATCH v9 0/5] Add NUMA-awareness to qspinlock

On 1/23/20 10:25 AM, Waiman Long wrote:
> On 1/23/20 6:35 AM, Will Deacon wrote:
>> Hi folks,
>>
>> (I think Lihao is travelling at the moment, so he may be delayed in his
>> replies)
>>
>> On Wed, Jan 22, 2020 at 12:24:58PM -0500, Waiman Long wrote:
>>> On 1/22/20 6:45 AM, Lihao Liang wrote:
>>>> On Wed, Jan 22, 2020 at 10:28 AM Alex Kogan <alex.kogan@...cle.com> wrote:
>>>>> Summary
>>>>> -------
>>>>>
>>>>> Lock throughput can be increased by handing a lock to a waiter on the
>>>>> same NUMA node as the lock holder, provided care is taken to avoid
>>>>> starvation of waiters on other NUMA nodes. This patch introduces CNA
>>>>> (compact NUMA-aware lock) as the slow path for qspinlock. It is
>>>>> enabled through a configuration option (NUMA_AWARE_SPINLOCKS).
>>>>>
>>>> Thanks for your patches. The experimental results look promising!
>>>>
>>>> I understand that the new CNA qspinlock uses randomization to achieve
>>>> long-term fairness, and provides the numa_spinlock_threshold parameter
>>>> for users to tune. As Linux runs extremely diverse workloads, it is not
>>>> clear how randomization affects its fairness, and how users with
>>>> different requirements are supposed to tune this parameter.
>>>>
>>>> To this end, Will and I consider it beneficial to be able to answer the
>>>> following question:
>>>>
>>>> With different values of numa_spinlock_threshold and
>>>> SHUFFLE_REDUCTION_PROB_ARG, how long do threads running on different
>>>> sockets have to wait to acquire the lock? This is particularly relevant
>>>> in high contention situations when new threads keep arriving on the same
>>>> socket as the lock holder.
>>>>
>>>> In this email, I try to provide some formal analysis to address this
>>>> question. Let's assume the probability for the lock to stay on the
>>>> same socket is *at least* p, which corresponds to the probability for
>>>> the function probably(unsigned int num_bits) in the patch to return *false*,
>>>> where SHUFFLE_REDUCTION_PROB_ARG is passed as the value of num_bits to the
>>>> function.
>>> That is not strictly true from my understanding of the code. The
>>> probably() function does not come into play if a secondary queue is
>>> present. Also calling cna_scan_main_queue() doesn't guarantee that a
>>> waiter in the same node can be found. So the simple mathematical
>>> analysis isn't that applicable in this case. One will have to do an
>>> actual simulation to find out what the actual behavior will be.
>> It's certainly true that the analysis is based on the worst-case scenario,
>> but I think it's still worth considering. For example, the secondary queue
>> does not exist initially so it seems a bit odd that we only instantiate it
>> with < 1% probability.
>>
>> That said, my real concern with any of this is that it makes formal
>> modelling and analysis of the qspinlock considerably more challenging. I
>> would /really/ like to see an update to the TLA+ model we have of the
>> current implementation [1] and preferably also the userspace version I
>> hacked together [2] so that we can continue to test and validate changes
>> to the code outside of the usual kernel stress-testing.
> I do agree that the current CNA code is hard to model. The CNA lock
> behaves like a regular qspinlock in many cases. If the lock becomes
> fairly contended with waiters from different nodes, it will
> opportunistically switch to CNA mode where preference is given to
> waiters in the same node.

BTW, I added the attached draft lock_event patch on top of the v9 CNA
patch series to observe the behavior of the CNA lock. Using a 2-socket
96-thread x86-64 server, the lock event output after boot up was:

cna_intra_max=1942
cna_mainscan_hit=134
cna_merge_queue=73
cna_prescan_hit=16662
cna_prescan_miss=268
cna_splice_new=352
cna_splice_old=2415
lock_pending=130090
lock_slowpath=191868
lock_use_node2=135

After resetting the counts and running a 96-thread lock stress test for
10s, I got

cna_intra_max=65536
cna_mainscan_hit=46
cna_merge_queue=661
cna_prescan_hit=42486841
cna_prescan_miss=68
cna_splice_new=676
cna_splice_old=402
lock_pending=11012
lock_slowpath=44332335
lock_use_node2=57203

So the cna_intra_max does go to the maximum of 64k.

Cheers,
Longman


View attachment "0006-locking-qspinlock-Enable-lock-events-tracking-for-CN.patch" of type "text/x-patch" (7754 bytes)