linux-kernel - Re: [RFC PATCH] percpu system call: fast userspace percpu critical sections

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALCETrX4y2_cPdPHsr=iqcqS7k3GvBz7yBGb_d4A0wDUzbTWCg@mail.gmail.com>
Date:	Tue, 26 May 2015 14:44:06 -0700
From:	Andy Lutomirski <luto@...capital.net>
To:	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc:	Andi Kleen <andi@...stfloor.org>, Borislav Petkov <bp@...en8.de>,
	"H. Peter Anvin" <hpa@...or.com>,
	Lai Jiangshan <laijs@...fujitsu.com>,
	Ben Maurer <bmaurer@...com>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Ingo Molnar <mingo@...hat.com>,
	Josh Triplett <josh@...htriplett.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Michael Kerrisk <mtk.manpages@...il.com>,
	Linux API <linux-api@...r.kernel.org>,
	Linux Kernel <linux-kernel@...r.kernel.org>,
	Paul Turner <pjt@...gle.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Andrew Hunter <ahh@...gle.com>
Subject: Re: [RFC PATCH] percpu system call: fast userspace percpu critical sections

On Tue, May 26, 2015 at 2:18 PM, Andy Lutomirski <luto@...capital.net> wrote:
> On Tue, May 26, 2015 at 2:04 PM, Mathieu Desnoyers
>>
>>>
>>> It's too bad that not all architectures have a single-instruction
>>> unlocked compare-and-exchange.
>>
>> Based on my benchmarks, it's not clear that single-instruction
>> unlocked CAS is actually faster than doing the same with many
>> instructions.
>
> True, but with a single instruction the user can't get preempted in the middle.
>
> Looking at your code, it looks like percpu_user_sched_in has some
> potentially nasty issues with page faults.  Avoiding touching user
> memory from the scheduler would be quite nice from an implementation
> POV, and the x86-specific gs hack wins in that regard.

ARM has "TLB lockdown entries" which could, I think, be used to
implement per-cpu or per-thread mappings.  I'm actually rather
surprised that Linux doesn't already use a TLB lockdown entry for TLS.
(Hmm.  Maybe it's because the interface to write the entries requires
actually touching the page.  Maybe not -- the ARM docs, in general,
seem to be much less clear than the Intel and AMD docs.)

ARM doesn't seem to have any single-instruction compare-exchange or
similar instruction, though, so this might be all that useful.  On the
other hand, ARM can probably do reasonably efficient per-cpu memory
allocation and such with a single ldrex/strex pair.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/