[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20141025020324.GA28247@linux.vnet.ibm.com>
Date: Fri, 24 Oct 2014 19:03:24 -0700
From: "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To: Jay Vosburgh <jay.vosburgh@...onical.com>
Cc: Yanko Kaneti <yaneti@...lera.com>,
Josh Boyer <jwboyer@...oraproject.org>,
"Eric W. Biederman" <ebiederm@...ssion.com>,
Cong Wang <cwang@...pensource.com>,
Kevin Fenzi <kevin@...ye.com>, netdev <netdev@...r.kernel.org>,
"Linux-Kernel@...r. Kernel. Org" <linux-kernel@...r.kernel.org>,
mroos@...ux.ee, tj@...nel.org
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?
On Fri, Oct 24, 2014 at 05:20:48PM -0700, Jay Vosburgh wrote:
> Paul E. McKenney <paulmck@...ux.vnet.ibm.com> wrote:
>
> >On Fri, Oct 24, 2014 at 03:59:31PM -0700, Paul E. McKenney wrote:
> [...]
> >> Hmmm... It sure looks like we have some callbacks stuck here. I clearly
> >> need to take a hard look at the sleep/wakeup code.
> >>
> >> Thank you for running this!!!
> >
> >Could you please try the following patch? If no joy, could you please
> >add rcu:rcu_nocb_wake to the list of ftrace events?
>
> I tried the patch, it did not change the behavior.
>
> I enabled the rcu:rcu_barrier and rcu:rcu_nocb_wake tracepoints
> and ran it again (with this patch and the first patch from earlier
> today); the trace output is a bit on the large side so I put it and the
> dmesg log at:
>
> http://people.canonical.com/~jvosburgh/nocb-wake-dmesg.txt
>
> http://people.canonical.com/~jvosburgh/nocb-wake-trace.txt
Thank you again!
Very strange part of the trace. The only sign of CPU 2 and 3 are:
ovs-vswitchd-902 [000] .... 109.896840: rcu_barrier: rcu_sched Begin cpu -1 remaining 0 # 0
ovs-vswitchd-902 [000] .... 109.896840: rcu_barrier: rcu_sched Check cpu -1 remaining 0 # 0
ovs-vswitchd-902 [000] .... 109.896841: rcu_barrier: rcu_sched Inc1 cpu -1 remaining 0 # 1
ovs-vswitchd-902 [000] .... 109.896841: rcu_barrier: rcu_sched OnlineNoCB cpu 0 remaining 1 # 1
ovs-vswitchd-902 [000] d... 109.896841: rcu_nocb_wake: rcu_sched 0 WakeNot
ovs-vswitchd-902 [000] .... 109.896841: rcu_barrier: rcu_sched OnlineNoCB cpu 1 remaining 2 # 1
ovs-vswitchd-902 [000] d... 109.896841: rcu_nocb_wake: rcu_sched 1 WakeNot
ovs-vswitchd-902 [000] .... 109.896842: rcu_barrier: rcu_sched OnlineNoCB cpu 2 remaining 3 # 1
ovs-vswitchd-902 [000] d... 109.896842: rcu_nocb_wake: rcu_sched 2 WakeNotPoll
ovs-vswitchd-902 [000] .... 109.896842: rcu_barrier: rcu_sched OnlineNoCB cpu 3 remaining 4 # 1
ovs-vswitchd-902 [000] d... 109.896842: rcu_nocb_wake: rcu_sched 3 WakeNotPoll
ovs-vswitchd-902 [000] .... 109.896843: rcu_barrier: rcu_sched Inc2 cpu -1 remaining 4 # 2
The pair of WakeNotPoll trace entries says that at that point, RCU believed
that the CPU 2's and CPU 3's rcuo kthreads did not exist. :-/
More diagnostics in order...
Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists