linux-kernel - Re: [PATCH 1/2] rcu: Don't chase unnecessary quiescent states after extended grace periods

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <AANLkTimjZnAfeuAXU7oXPRG5=qJ2d7aZVoozSm88kXMT@mail.gmail.com>
Date:	Wed, 24 Nov 2010 14:48:46 +0100
From:	Frederic Weisbecker <fweisbec@...il.com>
To:	paulmck@...ux.vnet.ibm.com
Cc:	LKML <linux-kernel@...r.kernel.org>,
	Lai Jiangshan <laijs@...fujitsu.com>,
	Ingo Molnar <mingo@...e.hu>,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [PATCH 1/2] rcu: Don't chase unnecessary quiescent states after
 extended grace periods

2010/11/24 Paul E. McKenney <paulmck@...ux.vnet.ibm.com>:
> On Wed, Nov 24, 2010 at 03:33:21AM +0100, Frederic Weisbecker wrote:
>> 2010/11/24 Frederic Weisbecker <fweisbec@...il.com>:
>> > On Tue, Nov 23, 2010 at 04:58:20PM -0800, Paul E. McKenney wrote:
>> >> On Wed, Nov 24, 2010 at 01:31:12AM +0100, Frederic Weisbecker wrote:
>> >> > When a cpu is in an extended quiescent state, which includes idle
>> >> > nohz or CPU offline, others CPUs will take care of the grace periods
>> >> > on its behalf.
>> >> >
>> >> > When this CPU exits its extended quiescent state, it will catch up
>> >> > with the last started grace period and start chasing its own
>> >> > quiescent states to end the current grace period.
>> >> >
>> >> > However in this case we always start to track quiescent states if the
>> >> > grace period number has changed since we started our extended
>> >> > quiescent state. And we do this because we always assume that the last
>> >> > grace period is not finished and needs us to complete it, which is
>> >> > sometimes wrong.
>> >> >
>> >> > This patch verifies if the last grace period has been completed and
>> >> > if so, start hunting local quiescent states like we always did.
>> >> > Otherwise don't do anything, this economizes us some work and
>> >> > an unnecessary softirq.
>> >>
>> >> Interesting approach!  I can see how this helps in the case where the
>> >> CPU just came online, but I don't see it in the nohz case, because the
>> >> nohz case does not update the rdp->completed variable.  In contrast,
>> >> the online path calls rcu_init_percpu_data() which sets up this variable.
>> >>
>> >> So, what am I missing here?
>> >>
>> >>                                                       Thanx, Paul
>> >>
>> >> PS.  It might well be worthwhile for the online case alone, but
>> >>      the commit message does need to be accurate.
>> >
>> >
>> > So, let's take this scenario (inspired from a freshly dumped trace to
>> > clarify my ideas):
>> >
>> > CPU 1 was idle, it has missed several grace periods, but CPU 0 took care
>> > of that.
>> >
>> > Hence, CPU 0's rdp->gpnum = rdp->completed = 4294967000


(Actually I was talking about CPU 1 here. CPU was idle and
has rdp->gpnum and rdp->completed at 4294967000.

While the global state and even the node are on 4294967002


> But CPU 1's rdp state is outdated, due to locking design.

Yeah.


>> > But the last grace period was 4294967002 and it's completed
>> > (rnp->pgnum = rnp->completed = rsp->pgnum = 4294967002).
>> >
>> > Now CPU 0 gets a tick for a random reason, it calls rcu_check_callbacks()
>> > and then rcu_pending() which raises the softirq because of this:
>> >
>> >        /* Has another RCU grace period completed?  */
>> >        if (ACCESS_ONCE(rnp->completed) != rdp->completed) { /* outside lock */
>> >                rdp->n_rp_gp_completed++;
>> >                return 1;
>> >        }
>
> Yes, because CPU 0 has not yet updated its rdp state.


So again I made a mistake. Here it's CPU 1 that takes this path, the outdated
CPU that was idle.

So yeah, here it has not yet updated its rdp state, it's still 2
offsets backwards.


>> > The softirq fires, we call rcu_process_gp_end() which will
>> > update rdp->completed into the global state:
>> > (rsp->completed = rnp->pgnum = rnp->completed = rsp->pgnum = 4294967002).
>> >
>> > But rsp->pgnum is still 2 offsets backwards.
>
> This one is hard for me to believe -- rsp->gpnum drives the rest of
> the ->gpnum fields, right?


Oops, I meant rdp->pgnum is still 2 offsets backward, for CPU 1.

Sorry.


>> > Now we call rcu_check_quiescent_state() -> check_for_new_grace_period()
>> > -> note_new_gpnum() and then we end up a requested quiescent state while
>> > every grace periods are completed.
>>
>> Sorry I should have described that in the changelogs but my ideas
>> weren't as clear as they
>> are now (at least I feel they are, doesn't mean they actually are ;)
>> Chasing these RCU bugs for too much hours has toasted my brain..
>
> Welcome to my world!!!  But keep in mind that an extra timer tick
> or two is much preferable to a potential hang!  And you only get
> the extra timer tick if there was some other reason that the
> CPU came out of nohz mode, correct?


Yeah, either because of a timer, hrtimer, or a reschedule.
But you still generate a spurious softirq in this scheme.

Two in fact: one because of the rnp->completed != rsp->completed condition
in rcu_pending(), another one because when we update the pgnum, we
always start chasing QS, regardless of the last GP beeing completed or not.


> Which is why checking the rnp fields makes more sense to me, actually.
> Acquiring rnp->lock is much less painful than pinning down the rsp state.

Right.

Another thing, we  already have the (rnp->gpnum) != rdp->gpnu check in
rcu_pending(),
why also checking (rnp->completed) != rdp->completed) ?

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/