linux-kernel - Re: [RFC PATCH] sys_membarrier(): system/process-wide memory barrier (x86) (v12)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1003922584.10662.1426532015839.JavaMail.zimbra@efficios.com>
Date:	Mon, 16 Mar 2015 18:53:35 +0000 (UTC)
From:	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	linux-kernel@...r.kernel.org,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Nicholas Miell <nmiell@...cast.net>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Ingo Molnar <mingo@...hat.com>,
	Alan Cox <gnomes@...rguk.ukuu.org.uk>,
	Lai Jiangshan <laijs@...fujitsu.com>,
	Stephen Hemminger <stephen@...workplumber.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Josh Triplett <josh@...htriplett.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	David Howells <dhowells@...hat.com>,
	Nick Piggin <npiggin@...nel.dk>
Subject: Re: [RFC PATCH] sys_membarrier(): system/process-wide memory
 barrier (x86) (v12)

----- Original Message -----
> From: "Peter Zijlstra" <peterz@...radead.org>
> To: "Mathieu Desnoyers" <mathieu.desnoyers@...icios.com>
> Cc: linux-kernel@...r.kernel.org, "KOSAKI Motohiro" <kosaki.motohiro@...fujitsu.com>, "Steven Rostedt"
> <rostedt@...dmis.org>, "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>, "Nicholas Miell" <nmiell@...cast.net>,
> "Linus Torvalds" <torvalds@...ux-foundation.org>, "Ingo Molnar" <mingo@...hat.com>, "Alan Cox"
> <gnomes@...rguk.ukuu.org.uk>, "Lai Jiangshan" <laijs@...fujitsu.com>, "Stephen Hemminger"
> <stephen@...workplumber.org>, "Andrew Morton" <akpm@...ux-foundation.org>, "Josh Triplett" <josh@...htriplett.org>,
> "Thomas Gleixner" <tglx@...utronix.de>, "David Howells" <dhowells@...hat.com>, "Nick Piggin" <npiggin@...nel.dk>
> Sent: Monday, March 16, 2015 1:21:04 PM
> Subject: Re: [RFC PATCH] sys_membarrier(): system/process-wide memory barrier (x86) (v12)
> 
> On Mon, Mar 16, 2015 at 03:43:56PM +0000, Mathieu Desnoyers wrote:
> > > On which; I absolutely hate that rq->lock thing in there. What is
> > > 'wrong' with doing a lockless compare there? Other than not actually
> > > being able to deref rq->curr of course, but we need to fix that anyhow.
> > 
> > If we can make sure rq->curr deref could be done without holding the rq
> > lock, then I think all we would need is to ensure that updates to rq->curr
> > are surrounded by memory barriers. Therefore, we would have the following:
> > 
> > * When a thread is scheduled out, a memory barrier would be issued before
> >   rq->curr is updated to the next thread task_struct.
> > 
> > * Before a thread is scheduled in, a memory barrier needs to be issued
> >   after rq->curr is updated to the incoming thread.
> 
> I'm not entirely awake atm but I'm not seeing why it would need to be
> that strict; I think the current single MB on task switch is sufficient
> because if we're in the middle of schedule, userspace isn't actually
> running.
> 
> So from the point of userspace the task switch is atomic. Therefore even
> if we do not get a barrier before setting ->curr, the expedited thing
> missing us doesn't matter as userspace cannot observe the difference.

AFAIU, atomicity is not what matters here. It's more about memory ordering.
What is guaranteeing that upon entry in kernel-space, all prior memory
accesses (loads and stores) are ordered prior to following loads/stores ?

The same applies when returning to user-space: what is guaranteeing that all
prior loads/stores are ordered before the user-space loads/stores performed
after returning to user-space ?

> 
> > In order to be able to dereference rq->curr->mm without holding the
> > rq->lock, do you envision we should protect task reclaim with RCU-sched ?
> 
> A recent discussion had Linus suggest SLAB_DESTROY_BY_RCU, although I
> think Oleg did mention it would still be 'interesting'. I've not yet had
> time to really think about that.

This might be an "interesting" modification. :) This could perhaps come
as an optimization later on ?

By the way, I now remember why we start from the mm_cpumask, and then
double-check the mm: using the mm_cpumask serves as an approximation
of the CPUs we need to double-check. Therefore, rather than grabbing
the rq lock for all CPUs, we only need to grab it for CPUs that are
in the mm_cpumask.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/