[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080830143438.GF7107@linux.vnet.ibm.com>
Date: Sat, 30 Aug 2008 07:34:38 -0700
From: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To: Manfred Spraul <manfred@...orfullife.com>
Cc: Lai Jiangshan <laijs@...fujitsu.com>, linux-kernel@...r.kernel.org,
cl@...ux-foundation.org, mingo@...e.hu, akpm@...ux-foundation.org,
dipankar@...ibm.com, josht@...ux.vnet.ibm.com, schamp@....com,
niv@...ibm.com, dvhltc@...ibm.com, ego@...ibm.com,
rostedt@...dmis.org, peterz@...radead.org
Subject: Re: [PATCH, RFC, tip/core/rcu] v3 scalable classic RCU
implementation
On Sat, Aug 30, 2008 at 03:32:36PM +0200, Manfred Spraul wrote:
> Lai Jiangshan wrote:
>> I just had a fast review. so my comments is nothing but cleanup.
>>
>> Thanks, Lai.
>>
>> Paul E. McKenney wrote:
>>
>>> Hello!
>>>
>>
>>
>>> +rcu_start_gp(struct rcu_state *rsp, unsigned long iflg)
>>> + __releases(rsp->rda[smp_processor_id()]->lock)
>>> +{
>>> + unsigned long flags = iflg;
>>> + struct rcu_data *rdp = rsp->rda[smp_processor_id()];
>>> + struct rcu_node *rnp = rcu_get_root(rsp);
>>> + struct rcu_node *rnp_cur;
>>> + struct rcu_node *rnp_end;
>>> +
>>> + if (!cpu_needs_another_gp(rsp, rdp)) {
>>> /*
>>> - * Accessing nohz_cpu_mask before incrementing rcp->cur needs a
>>> - * Barrier Otherwise it can cause tickless idle CPUs to be
>>> - * included in rcp->cpumask, which will extend graceperiods
>>> - * unnecessarily.
>>> + * Either there is no need to detect any more grace periods
>>> + * at the moment, or we are already in the process of
>>> + * detecting one. Either way, we should not start a new
>>> + * RCU grace period, so drop the lock and return.
>>> */
>>> - smp_mb();
>>> - cpus_andnot(rcp->cpumask, cpu_online_map, nohz_cpu_mask);
>>> + spin_unlock_irqrestore(&rnp->lock, flags);
>>> + return;
>>> + }
>>> +
>>> + /* Advance to a new grace period and initialize state. */
>>> +
>>> + rsp->gpnum++;
>>> + rsp->signaled = RCU_SIGNAL_INIT;
>>> + rsp->jiffies_force_qs = jiffies + RCU_JIFFIES_TILL_FORCE_QS;
>>> + record_gp_stall_check_time();
>>> + dyntick_save_completed(rsp, rsp->completed - 1);
>>> + note_new_gpnum(rsp, rdp);
>>> +
>>> + /*
>>> + * Because we are first, we know that all our callbacks will
>>> + * be covered by this upcoming grace period, even the ones
>>> + * that were registered arbitrarily recently.
>>> + */
>>> +
>>> + rcu_next_callbacks_are_ready(rdp);
>>> + rdp->nxttail[RCU_WAIT_TAIL] = rdp->nxttail[RCU_NEXT_TAIL];
>>> - rcp->signaled = 0;
>>> + /* Special-case the common single-level case. */
>>> +
>>> + if (NUM_RCU_NODES == 1) {
>>> + rnp->qsmask = rnp->qsmaskinit;
>>>
>>
>> I tried a mask like qsmaskinit before. The system came to deadlock
>> when I did on/offline cpus.
>> I didn't find out the whys for I bethought of these two problem:
>>
>> problem 1:
>> ----race condition 1:
>> <cpu_down>
>> synchronize_rcu <called from offline handler in other subsystem>
>> rcu_offline_cpu
>>
>>
>> -----race condition 2:
>> rcu_online_cpu
>> synchronize_rcu <called from online handler in other subsystem>
>> <cpu_up>
>>
>> in these two condition, synchronize_rcu isblocked for ever for
>> synchronize_rcu have to wait a cpu in rnp->qsmask, but this
>> cpu don't run.
>>
>>
> Can we disallow synchronize_rcu() from the cpu notifiers? Are there any
> users that do a synchronize_rcu() from within the notifiers?
> I don't see any other solution.
I made force_quiescent_state() check for offline CPUs. (Well, actually
it is rcu_implicit_offline_qs(), which is indirectly called from
force_quiescent_state().
> Something like qsmaskinit is needed - always enumerating all cpus just
> doesn't scale.
Agreed!!!
> Perhaps it's possible to rely on CPU_DYING, but I haven't figured out yet
> how to handle read-side critical sections in CPU_DYING handlers.
> Interrupts after CPU_DYING could be handled by rcu_irq_enter(),
> rcu_irq_exit() [yes, they exist on x86: the arch code enables the local
> interrupts in order to process the currently queued interrupts]
My feeling is that CPU online/offline will be quite rare, so it should
be OK to clean up after the races in force_quiescent_state(), which in
this version is called every three ticks in a given grace period.
Yes, I did worry about the possibility of all CPUs being in dyntick-idle
mode, and the solution for that is (1) don't let a CPU that has RCU
callbacks pending go into dyntick-idle mode via rcu_needs_cpu() and
(2) don't let a grace period start unless there is at least one callback
that is not yet in the done state. But no, I am not certain that I have
gotten this completely correct yet.
Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists