[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090225190118.GI6797@linux.vnet.ibm.com>
Date: Wed, 25 Feb 2009 11:01:18 -0800
From: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To: Vegard Nossum <vegard.nossum@...il.com>
Cc: linux-kernel@...r.kernel.org, mingo@...e.hu, stable@...nel.org,
akpm@...ux-foundation.org, npiggin@...e.de, penberg@...helsinki.fi
Subject: Re: [PATCH] v4 Teach RCU that idle task is not quiscent state at
boot
On Wed, Feb 25, 2009 at 07:38:23PM +0100, Vegard Nossum wrote:
> 2009/2/25 Paul E. McKenney <paulmck@...ux.vnet.ibm.com>:
> > This patch fixes a bug located by Vegard Nossum with the aid of
> > kmemcheck, updated based on review comments from Nick Piggin,
> > Ingo Molnar, and Andrew Morton.
> >
> > The boot CPU runs in the context of its idle thread during boot-up.
> > During this time, idle_cpu(0) will always return nonzero, which will
> > fool Classic and Hierarchical RCU into deciding that a large chunk of
> > the boot-up sequence is a big long quiescent state. This in turn causes
> > RCU to prematurely end grace periods during this time.
> >
> > This patch changes the rcutree.c and rcuclassic.c rcu_check_callbacks()
> > function to ignore the idle task as a quiescent state until the
> > system has started up the scheduler in rest_init(), introducing a
> > new non-API function rcu_idle_now_means_idle() to inform RCU of this
> > transition. RCU maintains an internal rcu_idle_cpu_truthful variable
> > to track this state, which is then used by rcu_check_callback() to
> > determine if it should believe idle_cpu().
> >
> > Because this patch has the effect of disallowing RCU grace periods
> > during long stretches of the boot-up sequence, this patch also introduces
> > Josh Triplett's UP-only optimization that makes synchronize_rcu() be a
> > no-op if num_online_cpus() returns 1. This allows boot-time code that
> > calls synchronize_rcu() to proceed normally. Note, however, that RCU
> > callbacks registered by call_rcu() will likely queue up until later in
> > the boot sequence. Although rcuclassic and rcutree can also use this
> > same optimization after boot completes, rcupreempt must restrict its
> > use of this optimization to the portion of the boot sequence before the
> > scheduler starts up, given that an rcupreempt RCU read-side critical
> > section may be preeempted.
> >
> > In addition, this patch takes Nick Piggin's suggestion to make the
> > system_state global variable be __read_mostly.
> >
> > Changes since v3:
> >
> > o WARN_ON(nr_context_switches() > 0) to verify that RCU
> > switches out of boot-time mode before the first context
> > switch, as suggested by Nick Piggin.
> >
> > Changes since v2:
> >
> > o Created rcu_blocking_is_gp() internal-to-RCU API that
> > determines whether a call to synchronize_rcu() is itself
> > a grace period.
> >
> > o The definition of rcu_blocking_is_gp() for rcuclassic and
> > rcutree checks to see if but a single CPU is online.
> >
> > o The definition of rcu_blocking_is_gp() for rcupreempt
> > checks to see both if but a single CPU is online and if
> > the system is still in early boot.
> >
> > This allows rcupreempt to again work correctly if running
> > on a single CPU after booting is complete.
> >
> > o Added check to rcupreempt's synchronize_sched() for there
> > being but one online CPU.
> >
> > Tested all three variants both SMP and !SMP, booted fine, passed a short
> > rcutorture test on both x86 and Power.
> >
> > Located-by: Vegard Nossum <vegard.nossum@...il.com>
> > Signed-off-by: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
>
> Tested-by: Vegard Nossum <vegard.nossum@...il.com>
>
> I used such a patch and config (as attached) to verify. On unpatched
> kernel, I get this:
>
> rcu callbacks called: 2300 (should be 4000)
>
> And with the patch, it seems better:
>
> rcu callbacks called: 4000 (should be 4000)
>
> Actually, the first part of the text itself is wrong, because it used
> to count upwards. But it should really say "rcu callbacks NOT called".
Yes, your test patch does look like it validates my patch, thank you!!!
> Of course, this kind of check will always be inaccurate, so it is not
> fit for mainline. The inaccuracy stems from the fact that there could
> be other users of RCU that disturb our expected behaviour, and we
> don't know how quickly the callbacks will be run, only that if
> "enough" of them run inside the spinlock, something is probably wrong.
> But... what am I saying, you're the expert ;-)
On RCU, maybe, but the boot-time code has changed considerably since
I last looked at it. ;-)
> But it is useful enough to verify your patch. Some other configs I
> tried gave the correct result all the time, but I don't know what is
> special about this one.
Thank you again!!!
I will be submitting another patch that differs only in the names of
the variable and function that Ingo pointed out, so I believe it will
be legitimate to attach your Tested-by: to that upcoming patch.
Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists