linux-kernel - Re: [PATCH v2 3/5] locking/qspinlock: Introduce CNA into the slow path of qspinlock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <54241445-458C-4AE2-840B-6DFCCD410399@oracle.com>
Date:   Wed, 12 Jun 2019 00:38:29 -0400
From:   Alex Kogan <alex.kogan@...cle.com>
To:     "liwei (GF)" <liwei391@...wei.com>
Cc:     linux@...linux.org.uk, Peter Zijlstra <peterz@...radead.org>,
        mingo@...hat.com, will.deacon@....com, arnd@...db.de,
        Waiman Long <longman@...hat.com>, linux-arch@...r.kernel.org,
        linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
        Thomas Gleixner <tglx@...utronix.de>, bp@...en8.de,
        hpa@...or.com, x86@...nel.org, dave.dice@...cle.com,
        Rahul Yadav <rahul.x.yadav@...cle.com>,
        Steven Sistare <steven.sistare@...cle.com>,
        Daniel Jordan <daniel.m.jordan@...cle.com>
Subject: Re: [PATCH v2 3/5] locking/qspinlock: Introduce CNA into the slow
 path of qspinlock

Hi, Wei.

> On Jun 11, 2019, at 12:22 AM, liwei (GF) <liwei391@...wei.com> wrote:
> 
> Hi Alex,
> 
> On 2019/3/29 23:20, Alex Kogan wrote:
>> In CNA, spinning threads are organized in two queues, a main queue for
>> threads running on the same node as the current lock holder, and a
>> secondary queue for threads running on other nodes. At the unlock time,
>> the lock holder scans the main queue looking for a thread running on
>> the same node. If found (call it thread T), all threads in the main queue
>> between the current lock holder and T are moved to the end of the
>> secondary queue, and the lock is passed to T. If such T is not found, the
>> lock is passed to the first node in the secondary queue. Finally, if the
>> secondary queue is empty, the lock is passed to the next thread in the
>> main queue. For more details, see https://urldefense.proofpoint.com/v2/url?u=https-3A__arxiv.org_abs_1810.05600&d=DwICbg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Hvhk3F4omdCk-GE1PTOm3Kn0A7ApWOZ2aZLTuVxFK4k&m=U7mfTbYj1r2Te2BBUUNbVrRPuTa_ujlpR4GZfUsrGTM&s=Dw4O1EniF-nde4fp6RA9ISlSMOjWuqeR9OS1G0iauj0&e=.
>> 
>> Note that this variant of CNA may introduce starvation by continuously
>> passing the lock to threads running on the same node. This issue
>> will be addressed later in the series.
>> 
>> Enabling CNA is controlled via a new configuration option
>> (NUMA_AWARE_SPINLOCKS), which is enabled by default if NUMA is enabled.
>> 
>> Signed-off-by: Alex Kogan <alex.kogan@...cle.com>
>> Reviewed-by: Steve Sistare <steven.sistare@...cle.com>
>> ---
>> arch/x86/Kconfig                      |  14 +++
>> include/asm-generic/qspinlock_types.h |  13 +++
>> kernel/locking/mcs_spinlock.h         |  10 ++
>> kernel/locking/qspinlock.c            |  29 +++++-
>> kernel/locking/qspinlock_cna.h        | 173 ++++++++++++++++++++++++++++++++++
>> 5 files changed, 236 insertions(+), 3 deletions(-)
>> create mode 100644 kernel/locking/qspinlock_cna.h
>> 
> (SNIP)
>> +
>> +static __always_inline int get_node_index(struct mcs_spinlock *node)
>> +{
>> +	return decode_count(node->node_and_count++);
> When nesting level is > 4, it won't return a index >= 4 here and the numa node number
> is changed by mistake. It will go into a wrong way instead of the following branch.
> 
> 
> 	/*
> 	 * 4 nodes are allocated based on the assumption that there will
> 	 * not be nested NMIs taking spinlocks. That may not be true in
> 	 * some architectures even though the chance of needing more than
> 	 * 4 nodes will still be extremely unlikely. When that happens,
> 	 * we fall back to spinning on the lock directly without using
> 	 * any MCS node. This is not the most elegant solution, but is
> 	 * simple enough.
> 	 */
> 	if (unlikely(idx >= MAX_NODES)) {
> 		while (!queued_spin_trylock(lock))
> 			cpu_relax();
> 		goto release;
> 	}
Good point.
This patch does not handle count overflows gracefully.
It can be easily fixed by allocating more bits for the count — we don’t really need 30 bits for #NUMA nodes.

However, I am working on a new revision of the patch, in which the cna node encapsulates the mcs node (following Peter’s suggestion and similarly to pv_node).
With that approach, this issue is gone.

Best regards,
— Alex