[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240715095411.GY27299@noisy.programming.kicks-ass.net>
Date: Mon, 15 Jul 2024 11:54:11 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Bagas Sanjaya <bagasdotme@...il.com>
Cc: Shi-Wu, Lo(Gmail) <shiwulo@...il.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Ingo Molnar <mingo@...hat.com>, Will Deacon <will@...nel.org>,
Waiman Long <longman@...hat.com>, Boqun Feng <boqun.feng@...il.com>,
Mel Gorman <mgorman@...e.de>, Thomas Gleixner <tglx@...utronix.de>,
Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>,
"H. Peter Anvin" <hpa@...or.com>, Theodore Ts'o <tytso@....edu>
Subject: Re: A new spinlock for multicore (>16) platform
On Mon, Jul 15, 2024 at 06:45:16AM +0700, Bagas Sanjaya wrote:
> On Mon, Jul 15, 2024 at 01:07:40AM +0800, Shi-Wu, Lo(Gmail) wrote:
> > Dear Linux Contributors,
> > A detailed introduction to this method can be found in the following paper:
> > https://www.usenix.org/conference/osdi23/presentation/lo
So if I understand this right, the algorithm basically boils down to a
circular path and unlock will iterate this path looking for the next CPU
waiting.
The worst case wait-time is one full circle of the rotor, which is
bounded, so that's good.
The immediate problem however is that O(n) iteration on unlock, given
Linux runs on many big machines with 1e3 order of CPUs, which would be
quite terrible on low to medium contention.
You mention in Future work, to alleviate this unlock issue by switching
to a linked-list, whereby unlock would then become O(1). The problem
then becomes lock() needs to an insertion-sort, which needs to know the
rotor position (eg. current lock owner).
I don't think that is a much easier proposition -- notably uncontended
fast path doesn't (want) to track the rotor position, and in the worst
case you're still iterating 1e3 order CPUs.
Hierachical rotors come to mind, but complexity -- is it warranted?
> > Our laboratory is currently developing a system that can apply the
> > same optimization strategy to all multi-core processors. Below is our
> > plan.
> >
> > The New Method and Its Compatibility with qspinlock:
> > 1. The default algorithm in the Linux kernel remains qspinlock.
> > 2. A new file is created in /proc/routing_path, where a shortest path
> > can be input, for example:
> > sudo echo 1,2,3,4,16,17,18,19,5,6,7,8,11,12,13,14 > /proc/routing_path
> > 3. After inputting the shortest path, the kernel switches to using the
> > RON algorithm.
This is quite horrendous. If you cannot compute the OWCR from the
provided topology (CPUID on x86) 'nobody' is going to be using this.
Additionally, what locks are so contended that this gives a significant
performance gain, and can't we better rework the locking rather than the
lock implementation to alleviate this problem.
Eg. the futex hash lock is an oft cited example, but something like:
https://lkml.kernel.org/r/20230721105744.434742902@infradead.org
can significantly reduce this.
That is, in general is helps more to reduce lock contention rather than
to optimize the contention behaviour.
So which locks do you see sufficient contention on to make this
worthwhile and how much do you gain by doing this?
Additionally, per the paper there are Linux patches (albeit for rather
old kernel versions), why aren't those included?
Powered by blists - more mailing lists