[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4E66DCAB.8090801@am.sony.com>
Date: Tue, 6 Sep 2011 19:53:31 -0700
From: Frank Rowand <frank.rowand@...sony.com>
To: "paulmck@...ux.vnet.ibm.com" <paulmck@...ux.vnet.ibm.com>
CC: "Rowand, Frank" <Frank_Rowand@...yusa.com>,
Peter Zijlstra <peterz@...radead.org>,
linux-kernel <linux-kernel@...r.kernel.org>,
Thomas Gleixner <tglx@...utronix.de>,
linux-rt-users <linux-rt-users@...r.kernel.org>,
Mike Galbraith <efault@....de>, <linux-kernel@...r.kernel.org>,
Ingo Molnar <mingo@...e.hu>,
Venkatesh Pallipadi <venki@...gle.com>
Subject: Re: [ANNOUNCE] 3.0.1-rt11
On 08/26/11 16:55, Paul E. McKenney wrote:
> On Wed, Aug 24, 2011 at 04:58:49PM -0700, Frank Rowand wrote:
>> On 08/13/11 03:53, Peter Zijlstra wrote:
>>>
>>> Whee, I can skip release announcements too!
>>>
>>> So no the subject ain't no mistake its not, 3.0.1-rt11 is there for the
>>> grabs.
< snip >
>> I have a consistent (every boot) hang on boot. With a few
>> hacks to get console output, I get the
>>
>> rcu_preempt_state detected stalls on CPUs/tasks
< snip >
>> This is an ARM NaviEngine (out of tree, so I also have applied
>> a series of pages for platform support).
>>
>> CONFIG_PREEMPT_RT_FULL is set. Full config is attached.
I have also replicated the problem on the ARM RealView (in tree) and
without the RT patches.
>
> Hmmm... The last few that I have seen that looked like this were
> due to my messing up rcutorture so that the RCU-boost testing kthreads
> ran CPU-bound at real-time priority.
>
> Is it possible that something similar is happening on your system?
>
> Thanx, Paul
The problem ended up being caused by the allowed cpus mask being set
to all possible cpus for the ksoftirqd on the secondary processors.
So the RCU softirq was never executing on cpu 2.
I'll test the following patch on 3.1 tomorrow.
-Frank Rowand
Symptom: rcu stall
The problem was that ksoftirqd was woken on the secondary processors before
the secondary processors were online. This led to allowed cpus being set
to all cpus.
wake_up_process()
try_to_wake_up()
select_task_rq()
if (... || !cpu_online(cpu))
select_fallback_rq(task_cpu(p), p)
...
/* No more Mr. Nice Guy. */
dest_cpu = cpuset_cpus_allowed_fallback(p)
do_set_cpus_allowed(p, cpu_possible_mask)
# Thus ksoftirqd can now run on any cpu...
Signed-off-by: Frank Rowand <frank.rowand@...sony.com>
---
kernel/softirq.c | 19 14 + 5 - 0 !
1 file changed, 14 insertions(+), 5 deletions(-)
Index: b/kernel/softirq.c
===================================================================
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -55,6 +55,7 @@ EXPORT_SYMBOL(irq_stat);
static struct softirq_action softirq_vec[NR_SOFTIRQS] __cacheline_aligned_in_smp;
DEFINE_PER_CPU(struct task_struct *, ksoftirqd);
+DEFINE_PER_CPU(struct task_struct *, ksoftirqd_pending_online);
char *softirq_to_name[NR_SOFTIRQS] = {
"HI", "TIMER", "NET_TX", "NET_RX", "BLOCK", "BLOCK_IOPOLL",
@@ -862,28 +863,36 @@ static int __cpuinit cpu_callback(struct
return notifier_from_errno(PTR_ERR(p));
}
kthread_bind(p, hotcpu);
- per_cpu(ksoftirqd, hotcpu) = p;
+ per_cpu(ksoftirqd_pending_online, hotcpu) = p;
break;
case CPU_ONLINE:
case CPU_ONLINE_FROZEN:
+ per_cpu(ksoftirqd, hotcpu) =
+ per_cpu(ksoftirqd_pending_online, hotcpu);
+ per_cpu(ksoftirqd_pending_online, hotcpu) = NULL;
wake_up_process(per_cpu(ksoftirqd, hotcpu));
break;
#ifdef CONFIG_HOTPLUG_CPU
case CPU_UP_CANCELED:
case CPU_UP_CANCELED_FROZEN:
- if (!per_cpu(ksoftirqd, hotcpu))
+ p = per_cpu(ksoftirqd_pending_online, hotcpu);
+ if (!p)
+ p = per_cpu(ksoftirqd, hotcpu);
+ if (!p)
break;
/* Unbind so it can run. Fall thru. */
- kthread_bind(per_cpu(ksoftirqd, hotcpu),
- cpumask_any(cpu_online_mask));
+ kthread_bind(p, cpumask_any(cpu_online_mask));
case CPU_DEAD:
case CPU_DEAD_FROZEN: {
static const struct sched_param param = {
.sched_priority = MAX_RT_PRIO-1
};
- p = per_cpu(ksoftirqd, hotcpu);
+ p = per_cpu(ksoftirqd_pending_online, hotcpu);
+ if (!p)
+ p = per_cpu(ksoftirqd, hotcpu);
per_cpu(ksoftirqd, hotcpu) = NULL;
+ per_cpu(ksoftirqd_pending_online, hotcpu) = NULL;
sched_setscheduler_nocheck(p, SCHED_FIFO, ¶m);
kthread_stop(p);
takeover_tasklets(hotcpu);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists