linux-kernel - Re: [RFC PATCH] percpu system call: fast userspace percpu critical sections

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CADroS=4KKb4SKWNToqsqYm5qPq10OPy00yOTBkzaZO65kib1-w@mail.gmail.com>
Date:	Fri, 22 May 2015 15:06:47 -0700
From:	Andrew Hunter <ahh@...gle.com>
To:	Andy Lutomirski <luto@...capital.net>
Cc:	Michael Kerrisk <mtk.manpages@...il.com>,
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
	Paul Turner <pjt@...gle.com>, Ben Maurer <bmaurer@...com>,
	Linux Kernel <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...hat.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Josh Triplett <josh@...htriplett.org>,
	Lai Jiangshan <laijs@...fujitsu.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Linux API <linux-api@...r.kernel.org>
Subject: Re: [RFC PATCH] percpu system call: fast userspace percpu critical sections

On Fri, May 22, 2015 at 1:53 PM, Andy Lutomirski <luto@...capital.net> wrote:
> Create an array of user-managed locks, one per cpu.  Call them lock[i]
> for 0 <= i < ncpus.
>
> To acquire, look up your CPU number.  Then, atomically, check that
> lock[cpu] isn't held and, if so, mark it held and record both your tid
> and your lock acquisition count.  If you learn that the lock *was*
> held after all, signal the holder (with kill or your favorite other
> mechanism), telling it which lock acquisition count is being aborted.
> Then atomically steal the lock, but only if the lock acquisition count
> hasn't changed.
>

We had to deploy the userspace percpu API (percpu sharded locks,
{double,}compare-and-swap, atomic-increment, etc) universally to the
fleet without waiting for 100% kernel penetration, not to mention
wanting to disable the kernel acceleration in case of kernel bugs.
(Since this is mostly used in core infrastructure--malloc, various
statistics platforms, etc--in userspace, checking for availability
isn't feasible. The primitives have to work 100% of the time or it
would be too complex for our developers to bother using them.)

So we did basically this (without the lock stealing...): we have a
single per-cpu spin lock manipulated with atomics, which we take very
briefly to implement (e.g.) compare-and-swap.  The performance is
hugely worse; typical overheads are in the 10x range _without_ any
on-cpu contention. Uncontended atomics are much cheaper than they were
on pre-Nehalem chips, but they still can't hold a candle to
unsynchronized instructions.

As a fallback path for userspace, this is fine--if 5% of binaries on
busted kernels aren't quite as fast, we can work with that in exchange
for being able to write a percpu op without worrying about what to do
on -ENOSYS.  But it's just not fast enough to compete as the intended
way to do things.

AHH
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/