linux-kernel - Re: [RFC PATCH] alispinlock: acceleration from lock integration on multi-core platform

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160105214227.3a2adcd2@lxorguk.ukuu.org.uk>
Date:	Tue, 5 Jan 2016 21:42:27 +0000
From:	One Thousand Gnomes <gnomes@...rguk.ukuu.org.uk>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	ling.ma.program@...il.com, waiman.long@....com, mingo@...hat.com,
	linux-kernel@...r.kernel.org, akpm@...ux-foundation.org,
	ling.ml@...baba-inc.com
Subject: Re: [RFC PATCH] alispinlock: acceleration from lock integration on
 multi-core platform

> It suffers the typical problems all those constructs do; namely it
> wrecks accountability.

That's "government thinking" ;-) - for most real users throughput is
more important than accountability. With the right API it ought to also
be compile time switchable.

> But here that is compounded by the fact that you inject other people's
> work into 'your' lock region, thereby bloating lock hold times. Worse,
> afaict (from a quick reading) there really isn't a bound on the amount
> of work you inject.

That should be relatively easy to fix but for this kind of lock you
normally get the big wins from stuff that is only a short amount of
executing code. The fairness your trade in the cases it is useful should
be tiny except under extreme load, where the "accountability first"
behaviour would be to fall over in a heap.

If your "lock" involves a lot of work then it probably should be a work
queue or not using this kind of locking.

> And while its a cute collapse of an MCS lock and lockless list style
> work queue (MCS after all is a lockless list), saving a few cycles from
> the naive spinlock+llist implementation of the same thing, I really
> do not see enough justification for any of this.

I've only personally dealt with such locks in the embedded space but
there it was a lot more than a few cycles because you go from

	take lock
						spins
	pull things into cache
	do stuff
	cache lines go write/exclusive
	unlock

						take lock
						move all the cache
						do stuff
						etc

to

	take lock
						queue work
	pull things into cache
	do work 1
	caches line go write/exclusive
	do work 2

	unlock
						done

and for the kind of stuff you apply those locks you got big improvements.
Even on crappy little embedded processors cache bouncing hurts. Even
better work merging locks like this tend to improve throughput more the
higher the contention unlike most other lock types.

The claim in the original post is 3x performance but doesn't explain
performance doing what, or which kernel locks were switched and what
patches were used. I don't find the numbers hard to believe for a big big
box, but I'd like to see the actual use case patches so it can be benched
with other workloads and also for latency and the like.

Alan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/