lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4E66DCAB.8090801@am.sony.com>
Date:	Tue, 6 Sep 2011 19:53:31 -0700
From:	Frank Rowand <frank.rowand@...sony.com>
To:	"paulmck@...ux.vnet.ibm.com" <paulmck@...ux.vnet.ibm.com>
CC:	"Rowand, Frank" <Frank_Rowand@...yusa.com>,
	Peter Zijlstra <peterz@...radead.org>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	linux-rt-users <linux-rt-users@...r.kernel.org>,
	Mike Galbraith <efault@....de>, <linux-kernel@...r.kernel.org>,
	Ingo Molnar <mingo@...e.hu>,
	Venkatesh Pallipadi <venki@...gle.com>
Subject: Re: [ANNOUNCE] 3.0.1-rt11

On 08/26/11 16:55, Paul E. McKenney wrote:
> On Wed, Aug 24, 2011 at 04:58:49PM -0700, Frank Rowand wrote:
>> On 08/13/11 03:53, Peter Zijlstra wrote:
>>>
>>> Whee, I can skip release announcements too!
>>>
>>> So no the subject ain't no mistake its not, 3.0.1-rt11 is there for the
>>> grabs.

< snip >

>> I have a consistent (every boot) hang on boot.  With a few
>> hacks to get console output, I get the
>>
>>   rcu_preempt_state detected stalls on CPUs/tasks

< snip >

>> This is an ARM NaviEngine (out of tree, so I also have applied
>> a series of pages for platform support).
>>
>> CONFIG_PREEMPT_RT_FULL is set.  Full config is attached.

I have also replicated the problem on the ARM RealView (in tree) and
without the RT patches.

> 
> Hmmm...  The last few that I have seen that looked like this were
> due to my messing up rcutorture so that the RCU-boost testing kthreads
> ran CPU-bound at real-time priority.
> 
> Is it possible that something similar is happening on your system?
> 
>                                                         Thanx, Paul

The problem ended up being caused by the allowed cpus mask being set
to all possible cpus for the ksoftirqd on the secondary processors.
So the RCU softirq was never executing on cpu 2.

I'll test the following patch on 3.1 tomorrow.

-Frank Rowand


Symptom: rcu stall

The problem was that ksoftirqd was woken on the secondary processors before
the secondary processors were online.  This led to allowed cpus being set
to all cpus.

   wake_up_process()
      try_to_wake_up()
         select_task_rq()
            if (... || !cpu_online(cpu))
               select_fallback_rq(task_cpu(p), p)
                  ...
                  /* No more Mr. Nice Guy. */
                  dest_cpu = cpuset_cpus_allowed_fallback(p)
                     do_set_cpus_allowed(p, cpu_possible_mask)
                        #  Thus ksoftirqd can now run on any cpu...


Signed-off-by: Frank Rowand <frank.rowand@...sony.com>
---
 kernel/softirq.c |   19 	14 +	5 -	0 !
 1 file changed, 14 insertions(+), 5 deletions(-)

Index: b/kernel/softirq.c
===================================================================
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -55,6 +55,7 @@ EXPORT_SYMBOL(irq_stat);
 static struct softirq_action softirq_vec[NR_SOFTIRQS] __cacheline_aligned_in_smp;
 
 DEFINE_PER_CPU(struct task_struct *, ksoftirqd);
+DEFINE_PER_CPU(struct task_struct *, ksoftirqd_pending_online);
 
 char *softirq_to_name[NR_SOFTIRQS] = {
 	"HI", "TIMER", "NET_TX", "NET_RX", "BLOCK", "BLOCK_IOPOLL",
@@ -862,28 +863,36 @@ static int __cpuinit cpu_callback(struct
 			return notifier_from_errno(PTR_ERR(p));
 		}
 		kthread_bind(p, hotcpu);
-  		per_cpu(ksoftirqd, hotcpu) = p;
+		per_cpu(ksoftirqd_pending_online, hotcpu) = p;
  		break;
 	case CPU_ONLINE:
 	case CPU_ONLINE_FROZEN:
+		per_cpu(ksoftirqd, hotcpu) =
+			per_cpu(ksoftirqd_pending_online, hotcpu);
+		per_cpu(ksoftirqd_pending_online, hotcpu) = NULL;
 		wake_up_process(per_cpu(ksoftirqd, hotcpu));
 		break;
 #ifdef CONFIG_HOTPLUG_CPU
 	case CPU_UP_CANCELED:
 	case CPU_UP_CANCELED_FROZEN:
-		if (!per_cpu(ksoftirqd, hotcpu))
+		p = per_cpu(ksoftirqd_pending_online, hotcpu);
+		if (!p)
+			p = per_cpu(ksoftirqd, hotcpu);
+		if (!p)
 			break;
 		/* Unbind so it can run.  Fall thru. */
-		kthread_bind(per_cpu(ksoftirqd, hotcpu),
-			     cpumask_any(cpu_online_mask));
+		kthread_bind(p, cpumask_any(cpu_online_mask));
 	case CPU_DEAD:
 	case CPU_DEAD_FROZEN: {
 		static const struct sched_param param = {
 			.sched_priority = MAX_RT_PRIO-1
 		};
 
-		p = per_cpu(ksoftirqd, hotcpu);
+		p = per_cpu(ksoftirqd_pending_online, hotcpu);
+		if (!p)
+			p = per_cpu(ksoftirqd, hotcpu);
 		per_cpu(ksoftirqd, hotcpu) = NULL;
+		per_cpu(ksoftirqd_pending_online, hotcpu) = NULL;
 		sched_setscheduler_nocheck(p, SCHED_FIFO, &param);
 		kthread_stop(p);
 		takeover_tasklets(hotcpu);

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ