lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 24 Oct 2014 21:33:33 -0700
From:	Jay Vosburgh <jay.vosburgh@...onical.com>
To:	paulmck@...ux.vnet.ibm.com
cc:	Yanko Kaneti <yaneti@...lera.com>,
	Josh Boyer <jwboyer@...oraproject.org>,
	"Eric W. Biederman" <ebiederm@...ssion.com>,
	Cong Wang <cwang@...pensource.com>,
	Kevin Fenzi <kevin@...ye.com>, netdev <netdev@...r.kernel.org>,
	"Linux-Kernel@...r. Kernel. Org" <linux-kernel@...r.kernel.org>,
	mroos@...ux.ee, tj@...nel.org
Subject: Re: localed stuck in recent 3.18 git in copy_net_ns?

Paul E. McKenney <paulmck@...ux.vnet.ibm.com> wrote:

>On Fri, Oct 24, 2014 at 05:20:48PM -0700, Jay Vosburgh wrote:
>> Paul E. McKenney <paulmck@...ux.vnet.ibm.com> wrote:
>> 
>> >On Fri, Oct 24, 2014 at 03:59:31PM -0700, Paul E. McKenney wrote:
>> [...]
>> >> Hmmm...  It sure looks like we have some callbacks stuck here.  I clearly
>> >> need to take a hard look at the sleep/wakeup code.
>> >> 
>> >> Thank you for running this!!!
>> >
>> >Could you please try the following patch?  If no joy, could you please
>> >add rcu:rcu_nocb_wake to the list of ftrace events?
>> 
>> 	I tried the patch, it did not change the behavior.
>> 
>> 	I enabled the rcu:rcu_barrier and rcu:rcu_nocb_wake tracepoints
>> and ran it again (with this patch and the first patch from earlier
>> today); the trace output is a bit on the large side so I put it and the
>> dmesg log at:
>> 
>> http://people.canonical.com/~jvosburgh/nocb-wake-dmesg.txt
>> 
>> http://people.canonical.com/~jvosburgh/nocb-wake-trace.txt
>
>Thank you again!
>
>Very strange part of the trace.  The only sign of CPU 2 and 3 are:
>
>    ovs-vswitchd-902   [000] ....   109.896840: rcu_barrier: rcu_sched Begin cpu -1 remaining 0 # 0
>    ovs-vswitchd-902   [000] ....   109.896840: rcu_barrier: rcu_sched Check cpu -1 remaining 0 # 0
>    ovs-vswitchd-902   [000] ....   109.896841: rcu_barrier: rcu_sched Inc1 cpu -1 remaining 0 # 1
>    ovs-vswitchd-902   [000] ....   109.896841: rcu_barrier: rcu_sched OnlineNoCB cpu 0 remaining 1 # 1
>    ovs-vswitchd-902   [000] d...   109.896841: rcu_nocb_wake: rcu_sched 0 WakeNot
>    ovs-vswitchd-902   [000] ....   109.896841: rcu_barrier: rcu_sched OnlineNoCB cpu 1 remaining 2 # 1
>    ovs-vswitchd-902   [000] d...   109.896841: rcu_nocb_wake: rcu_sched 1 WakeNot
>    ovs-vswitchd-902   [000] ....   109.896842: rcu_barrier: rcu_sched OnlineNoCB cpu 2 remaining 3 # 1
>    ovs-vswitchd-902   [000] d...   109.896842: rcu_nocb_wake: rcu_sched 2 WakeNotPoll
>    ovs-vswitchd-902   [000] ....   109.896842: rcu_barrier: rcu_sched OnlineNoCB cpu 3 remaining 4 # 1
>    ovs-vswitchd-902   [000] d...   109.896842: rcu_nocb_wake: rcu_sched 3 WakeNotPoll
>    ovs-vswitchd-902   [000] ....   109.896843: rcu_barrier: rcu_sched Inc2 cpu -1 remaining 4 # 2
>
>The pair of WakeNotPoll trace entries says that at that point, RCU believed
>that the CPU 2's and CPU 3's rcuo kthreads did not exist.  :-/

	On the test system I'm using, CPUs 2 and 3 really do not exist;
it is a 2 CPU system (Intel Core 2 Duo E8400). I mentioned this in an
earlier message, but perhaps you missed it in the flurry.

	Looking at the dmesg, the early boot messages seem to be
confused as to how many CPUs there are, e.g.,

[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[    0.000000] Hierarchical RCU implementation.
[    0.000000]  RCU debugfs-based tracing is enabled.
[    0.000000]  RCU dyntick-idle grace-period acceleration is enabled.
[    0.000000]  RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=4.
[    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
[    0.000000] NR_IRQS:16640 nr_irqs:456 0
[    0.000000]  Offload RCU callbacks from all CPUs
[    0.000000]  Offload RCU callbacks from CPUs: 0-3.

	but later shows 2:

[    0.233703] x86: Booting SMP configuration:
[    0.236003] .... node  #0, CPUs:      #1
[    0.255528] x86: Booted up 1 node, 2 CPUs

	In any event, the E8400 is a 2 core CPU with no hyperthreading.

	-J

---
	-Jay Vosburgh, jay.vosburgh@...onical.com
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ