[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1003922584.10662.1426532015839.JavaMail.zimbra@efficios.com>
Date: Mon, 16 Mar 2015 18:53:35 +0000 (UTC)
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: linux-kernel@...r.kernel.org,
KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
Steven Rostedt <rostedt@...dmis.org>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Nicholas Miell <nmiell@...cast.net>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Ingo Molnar <mingo@...hat.com>,
Alan Cox <gnomes@...rguk.ukuu.org.uk>,
Lai Jiangshan <laijs@...fujitsu.com>,
Stephen Hemminger <stephen@...workplumber.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Josh Triplett <josh@...htriplett.org>,
Thomas Gleixner <tglx@...utronix.de>,
David Howells <dhowells@...hat.com>,
Nick Piggin <npiggin@...nel.dk>
Subject: Re: [RFC PATCH] sys_membarrier(): system/process-wide memory
barrier (x86) (v12)
----- Original Message -----
> From: "Peter Zijlstra" <peterz@...radead.org>
> To: "Mathieu Desnoyers" <mathieu.desnoyers@...icios.com>
> Cc: linux-kernel@...r.kernel.org, "KOSAKI Motohiro" <kosaki.motohiro@...fujitsu.com>, "Steven Rostedt"
> <rostedt@...dmis.org>, "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>, "Nicholas Miell" <nmiell@...cast.net>,
> "Linus Torvalds" <torvalds@...ux-foundation.org>, "Ingo Molnar" <mingo@...hat.com>, "Alan Cox"
> <gnomes@...rguk.ukuu.org.uk>, "Lai Jiangshan" <laijs@...fujitsu.com>, "Stephen Hemminger"
> <stephen@...workplumber.org>, "Andrew Morton" <akpm@...ux-foundation.org>, "Josh Triplett" <josh@...htriplett.org>,
> "Thomas Gleixner" <tglx@...utronix.de>, "David Howells" <dhowells@...hat.com>, "Nick Piggin" <npiggin@...nel.dk>
> Sent: Monday, March 16, 2015 1:21:04 PM
> Subject: Re: [RFC PATCH] sys_membarrier(): system/process-wide memory barrier (x86) (v12)
>
> On Mon, Mar 16, 2015 at 03:43:56PM +0000, Mathieu Desnoyers wrote:
> > > On which; I absolutely hate that rq->lock thing in there. What is
> > > 'wrong' with doing a lockless compare there? Other than not actually
> > > being able to deref rq->curr of course, but we need to fix that anyhow.
> >
> > If we can make sure rq->curr deref could be done without holding the rq
> > lock, then I think all we would need is to ensure that updates to rq->curr
> > are surrounded by memory barriers. Therefore, we would have the following:
> >
> > * When a thread is scheduled out, a memory barrier would be issued before
> > rq->curr is updated to the next thread task_struct.
> >
> > * Before a thread is scheduled in, a memory barrier needs to be issued
> > after rq->curr is updated to the incoming thread.
>
> I'm not entirely awake atm but I'm not seeing why it would need to be
> that strict; I think the current single MB on task switch is sufficient
> because if we're in the middle of schedule, userspace isn't actually
> running.
>
> So from the point of userspace the task switch is atomic. Therefore even
> if we do not get a barrier before setting ->curr, the expedited thing
> missing us doesn't matter as userspace cannot observe the difference.
AFAIU, atomicity is not what matters here. It's more about memory ordering.
What is guaranteeing that upon entry in kernel-space, all prior memory
accesses (loads and stores) are ordered prior to following loads/stores ?
The same applies when returning to user-space: what is guaranteeing that all
prior loads/stores are ordered before the user-space loads/stores performed
after returning to user-space ?
>
> > In order to be able to dereference rq->curr->mm without holding the
> > rq->lock, do you envision we should protect task reclaim with RCU-sched ?
>
> A recent discussion had Linus suggest SLAB_DESTROY_BY_RCU, although I
> think Oleg did mention it would still be 'interesting'. I've not yet had
> time to really think about that.
This might be an "interesting" modification. :) This could perhaps come
as an optimization later on ?
By the way, I now remember why we start from the mm_cpumask, and then
double-check the mm: using the mm_cpumask serves as an approximation
of the CPUs we need to double-check. Therefore, rather than grabbing
the rq lock for all CPUs, we only need to grab it for CPUs that are
in the mm_cpumask.
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists