linux-kernel - Re: [RFC][PATCH 12/13] stop

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150625142011.GU19282@twins.programming.kicks-ass.net>
Date:	Thu, 25 Jun 2015 16:20:11 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc:	Oleg Nesterov <oleg@...hat.com>, tj@...nel.org, mingo@...hat.com,
	linux-kernel@...r.kernel.org, der.herr@...r.at, dave@...olabs.net,
	riel@...hat.com, viro@...IV.linux.org.uk,
	torvalds@...ux-foundation.org
Subject: Re: [RFC][PATCH 12/13] stop_machine: Remove lglock

On Thu, Jun 25, 2015 at 06:47:55AM -0700, Paul E. McKenney wrote:
> On Thu, Jun 25, 2015 at 01:07:34PM +0200, Peter Zijlstra wrote:
> > I'm still somewhat confused by the whole strict order sequence vs this
> > non ordered 'polling' of global state.
> > 
> > This funnel thing basically waits random times depending on the
> > contention of these mutexes and tries again. Ultimately serializing on
> > the root funnel thing.
> 
> Not random at all!

No, they are random per, definition it depends on the amount of
contention and since that's random, the rest it too.

> The whole funnel is controlled by the root ->exp_funnel_mutex holder,
> who is going to hold the lock for a single expedited grace period, then
> release it.  This means that any time a task acquires a lock, there is
> very likely to have been a recent state change.  Hence the checks after
> each lock acquisition.
> 
> So in the heavy-use case, what tends to happen is that there are one
> or two expedited grace periods, and then the entire queue of waiters
> acquiring ->exp_funnel_mutex simply evaporates -- they can make use of
> the expedited grace period whose completion resulted in their acquisition
> completing and thus them being awakened.  No fuss, no muss, no unnecessary
> contention or cache thrashing.

Plenty of cache trashing, since your 'tree' is not at all cache aligned
or even remotely coherent with the actual machine topology -- I'll keep
reminding you :-)

But I must admit that the workings of the sequence thing elided me this
morning. Yes that's much better than the strict ticket order of before.

> > You also do not take the actual RCU state machine into account -- this
> > is a parallel state.
> > 
> > Can't we integrate the force quiescent state machinery with the
> > expedited machinery -- that is instead of building a parallel state, use
> > the expedited thing to push the regular machine forward?
> > 
> > We can use the stop_machine calls to force the local RCU state forward,
> > after all, we _know_ we just made a context switch into the stopper
> > thread. All we need to do is disable interrupts to hold off the tick
> > (which normally drives the state machine) and just unconditionally
> > advance our state.
> > 
> > If we use the regular GP machinery, you also don't have to strongly
> > order the callers, just stick them on whatever GP was active when they
> > came in and let them roll, this allows much better (and more natural)
> > concurrent processing.
> 
> That gets quite complex, actually.  Lots of races with the normal grace
> periods doing one thing or another.

How so? I'm probably missing several years of RCU trickery and detail
again, but since we can advance from the tick, we should be able to
advance from the stop work with IRQs disabled with equal ease.

And since the stop work and the tick are fully serialized, there cannot
be any races there.

And the stop work against other CPUs is the exact same races you already
had with tick vs tick.

So please humour me and explain how all this is far more complicated ;-)

> However, it should be quite easy to go the other way and make the normal
> grace-period processing take advantage of expedited grace periods that
> happened to occur at the right time.  I will look into this, thank you
> for the nudge!

That should already be happening, right? Since we force context
switches, the tick driven RCU state machine will observe those and make
progress -- assuming it was trying to make progress at all of course.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/