lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOpXEOwSg_PC+D-dYsHspAnc1jw9b+gK2HSU3EE8AWmQSQ_qbA@mail.gmail.com>
Date: Mon, 15 Jul 2024 18:01:28 +0800
From: Shi-Wu, Lo(Gmail) <shiwulo@...il.com>
To: Waiman Long <longman@...hat.com>
Cc: Bagas Sanjaya <bagasdotme@...il.com>, 
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>, Ingo Molnar <mingo@...hat.com>, 
	Will Deacon <will@...nel.org>, Boqun Feng <boqun.feng@...il.com>, 
	Peter Zijlstra <peterz@...radead.org>, Mel Gorman <mgorman@...e.de>, 
	Thomas Gleixner <tglx@...utronix.de>, Borislav Petkov <bp@...en8.de>, 
	Dave Hansen <dave.hansen@...ux.intel.com>, "H. Peter Anvin" <hpa@...or.com>, 
	"Theodore Ts'o" <tytso@....edu>, 
	"ccu_cs_oslab@...glegroups.com" <ccu_cs_oslab@...glegroups.com>
Subject: Re: A new spinlock for multicore (>16) platform

Here are my explanations for the two issues you mentioned:

1) Each lock requires a routing table of size num_possible_cpus() for
the routing table. There can be thousands of spinlocks in the kernel and
it may be problematic to manage this extra memory.

We have implemented this method in the Linux kernel.
(https://github.com/shiwulo/ron-osdi2023/blob/main/qspinlock.patch)
All locks share a single routing table, and all locks share a single
waiting array. This Linux kernel has been running for over a year
without any issues. Currently, we have implemented this method on the
AMD 2990WX (32 cores) and AMD 3995WX (64 cores).

This is because context switches cannot occur when the Linux kernel is
using spinlocks, so we can allow all cores to share a waiting array.
The space complexity of RON executed in the Linux kernel is O(#core),
which is the same as the space complexity of qspinlock. Since the
Linux kernel operates on all cores of a processor, we can allow all
spinlocks to share a routing table.

2) Beside lock contention performance, the Linux kernel also has to
provide good uncontended spinlock performance. The unlock code currently
scan the routing table to find the next one to allocate the lock to
which may slow down the uncontended spinlock performance.

We use a bit array to speed up the unlock execution. The nth bit
represents that the nth core wants to enter the critical section.
Therefore, the next core to enter the critical section can be quickly
determined using __ffs or __cnz. (page 29 in the document.
https://www.cs.ccu.edu.tw/~shiwulo/nxt+sh-RON.pdf) The new method's
(nxtRON) unlock time complexity is O(1). Under low contention, nxtRON
performs almost identically to GNU's pthread_spin_lock (TTAS).
Similarly, under low contention, qspinlock's performance is nearly the
same as GNU's pthread_spin_lock.

As shown in the pdf file (slides 56 to 59,
https://www.cs.ccu.edu.tw/~shiwulo/nxt+sh-RON.pdf), there can be a
3.8% improvement for the levelDB application.



Furthermore, the issues with multi-core (#core >16) processors and
NUMA system are different. NUMA-aware spinlocks assume that data
transfer between processors is slow, while data transfer within the
same processor is fast. Based on this assumption, NUMA-aware spinlocks
allow tasks on the same processor to enter the critical section before
allowing tasks on other cores to do so.

This approach (NUMA-aware spinlocks) has two main drawbacks. The first
drawback is the issue of fairness. The second drawback is that
NUMA-aware spinlocks do not optimize data transfer within the same
processor. The RON algorithm adheres to bounded waiting and can
optimize for multi-core processors.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ