linux-kernel - Re: RCU stall and the system boot hang

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20151127162809.GM26643@linux.vnet.ibm.com>
Date:	Fri, 27 Nov 2015 08:28:09 -0800
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	fupan li <lifupan@...il.com>
Cc:	linux-rt-users@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: RCU stall and the system boot hang

On Fri, Nov 27, 2015 at 08:23:24PM +0800, fupan li wrote:
> Hi, Paul
> 
> On my Wildcat_Pass (Haswell) board, the system boot will hang as below.
> The kernel is preempt-rt kernel.
> But it was not reproduced every time, about 1 time per 5-10 boots.

CCing LMKL and linux-rt-users in case anyone else is seeing this.

OK, let's take a look at the stall warning...

[ . . . ]

> Btrfs loaded
> console [netcon0] enabled
> netconsole: network logging started
> rtc_cmos 00:00: setting system clock to 2015-11-11 18:18:03 UTC (1447265883)
> usb 1-9: new full-speed USB device number 3 using xhci_hcd
> IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
> IPv6: ADDRCONF(NETDEV_UP): eth1: link is not ready
> usb 1-9: ep 0x81 - rounding interval to 64 microframes, ep desc says 80
> microframes
> input: American Megatrends Inc. Virtual Keyboard and Mouse as
> /devices/pci0000:00/0000:00:14.0/usb1/1-9/1-9:1.0/0003:046B:FF10.0002/input/input3
> hid-generic 0003:046B:FF10.0002: input: USB HID v1.10 Keyboard [American
> Megatrends Inc. Virtual Keyboard and Mouse] on usb-0000:00:14.0-9/input0
> input: American Megatrends Inc. Virtual Keyboard and Mouse as
> /devices/pci0000:00/0000:00:14.0/usb1/1-9/1-9:1.1/0003:046B:FF10.0003/input/input4
> hid-generic 0003:046B:FF10.0003: input: USB HID v1.10 Mouse [American
> Megatrends Inc. Virtual Keyboard and Mouse] on usb-0000:00:14.0-9/input1
> ixgbe 0000:02:00.0: registered PHC device on eth2
> IPv6: ADDRCONF(NETDEV_UP): eth2: link is not ready
> ixgbe 0000:02:00.1: registered PHC device on eth3
> IPv6: ADDRCONF(NETDEV_UP): eth3: link is not ready
> igb 0000:04:00.0 eth0: igb: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow
> Control: RX/TX
> IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> cfg80211: Calling CRDA to update world regulatory domain
> Sending DHCP requests ., OK
> random: nonblocking pool is initialized
> cfg80211: Calling CRDA to update world regulatory domain
> cfg80211: Calling CRDA to update world regulatory domain
> cfg80211: Calling CRDA to update world regulatory domain
> cfg80211: Calling CRDA to update world regulatory domain
> cfg80211: Calling CRDA to update world regulatory domain
> cfg80211: Calling CRDA to update world regulatory domain
> INFO: rcu_preempt detected stalls on CPUs/tasks:
> 12: (0 ticks this GP) idle=f48/0/0 softirq=0/0 fqs=0
> 17: (59 GPs behind) idle=55a/0/0 softirq=0/0 fqs=0

So CPUs 12 and 17 are stalling the grace period.

> (detected by 5, t=21002 jiffies, g=-237, c=-238, q=136)
> Task dump for CPU 12:
> swapper/12      R  running task        0     0      1 0x00200000
>  ffffffff8140f1f7 ffff880859037eb8 ffffffff810b7ff5 ffff880859037ea8
>  ffffffff810759ef ffffffff82140220 000000000000000c ffffffff82140220
>  ffff880859034000 ffff880859037ea8 ffffffff8140f1f7 ffff880859037ec8
> Call Trace:
>  [<ffffffff8140f1f7>] ? debug_smp_processor_id+0x17/0x20
>  [<ffffffff810b7ff5>] ? rcu_eqs_enter_common+0x85/0x240
>  [<ffffffff810759ef>] ? __atomic_notifier_call_chain+0x4f/0x70
>  [<ffffffff8140f1f7>] ? debug_smp_processor_id+0x17/0x20
>  [<ffffffff8100dd52>] ? mwait_idle+0x42/0x1d0
>  [<ffffffff8100ea8f>] ? arch_cpu_idle+0xf/0x20
>  [<ffffffff8109688e>] ? cpu_startup_entry+0xae/0x300
>  [<ffffffff8103aabc>] ? start_secondary+0x12c/0x140

But CPU 12 seems to be idle (as is in fact indicated by the "idle=55a/0/0"
above), so the grace-period kthread should have reported a quiescent
state on its behalf.

> Task dump for CPU 17:
> swapper/17      R  running task        0     0      1 0x00200000
>  ffffffff8140f1f7 ffff880859063eb8 ffffffff810b7ff5 ffff880859063ea8
>  ffffffff810759ef ffffffff82140220 0000000000000011 ffffffff82140220
>  ffff880859060000 ffff880859063ea8 ffffffff8140f1f7 ffff880859063ec8
> Call Trace:
>  [<ffffffff8140f1f7>] ? debug_smp_processor_id+0x17/0x20
>  [<ffffffff810b7ff5>] ? rcu_eqs_enter_common+0x85/0x240
>  [<ffffffff810759ef>] ? __atomic_notifier_call_chain+0x4f/0x70
>  [<ffffffff8140f1f7>] ? debug_smp_processor_id+0x17/0x20
>  [<ffffffff8100dd52>] ? mwait_idle+0x42/0x1d0
>  [<ffffffff8100ea8f>] ? arch_cpu_idle+0xf/0x20
>  [<ffffffff8109688e>] ? cpu_startup_entry+0xae/0x300
>  [<ffffffff8103aabc>] ? start_secondary+0x12c/0x140

And the same for CPU 17.

> rcu_preempt kthread starved for 21002 jiffies!

But the grace-period kthread hasn't had a chance to run for more
than 21 seconds, which explains why the grace period is not ending.

Are you starting up a heavy real-time workload at boot time?

> cfg80211: Calling CRDA to update world regulatory domain
> cfg80211: Calling CRDA to update world regulatory domain
> cfg80211: Exceeded CRDA call max attempts. Not calling CRDA
> INFO: rcu_preempt detected stalls on CPUs/tasks:
> 12: (0 ticks this GP) idle=04c/0/0 softirq=0/0 fqs=1

At this point, CPU 17's quiescent state has been recorded.

> (detected by 24, t=84007 jiffies, g=-237, c=-238, q=147)
> Task dump for CPU 12:
> swapper/12      R  running task        0     0      1 0x00200000
>  ffffffff8140f1f7 ffff880859037eb8 ffffffff810b7ff5 ffff880859037ea8
>  ffffffff810759ef ffffffff82140220 000000000000000c ffffffff82140220
>  ffff880859034000 ffff880859037ea8 ffffffff8140f1f7 ffff880859037ec8
> Call Trace:
>  [<ffffffff8140f1f7>] ? debug_smp_processor_id+0x17/0x20
>  [<ffffffff810b7ff5>] ? rcu_eqs_enter_common+0x85/0x240
>  [<ffffffff810759ef>] ? __atomic_notifier_call_chain+0x4f/0x70
>  [<ffffffff8140f1f7>] ? debug_smp_processor_id+0x17/0x20
>  [<ffffffff8100dd52>] ? mwait_idle+0x42/0x1d0
>  [<ffffffff8100ea8f>] ? arch_cpu_idle+0xf/0x20
>  [<ffffffff8109688e>] ? cpu_startup_entry+0xae/0x300
>  [<ffffffff8103aabc>] ? start_secondary+0x12c/0x140

CPU 12 still looks idle.  Though it does appear to have transitioned
between idle and non-idle several times.

> rcu_preempt kthread starved for 63005 jiffies!

And the last time the grace-period kthread ran was about the time of
the first stall-warning message.  Strange...

> INFO: rcu_preempt detected stalls on CPUs/tasks:
> 12: (0 ticks this GP) idle=0a8/0/0 softirq=0/0 fqs=0
> 17: (60 GPs behind) idle=6b4/0/0 softirq=0/0 fqs=0
> (detected by 11, t=21002 jiffies, g=-236, c=-237, q=151)

And the old grace period (g=-237, c=-238) finished and a new one has
been stalled for 21 seconds.  Again, CPUs 12 and 17 are idle and
stalling the grace period.

> Task dump for CPU 12:
> swapper/12      R  running task        0     0      1 0x00200000
>  ffffffff8140f1f7 ffff880859037eb8 ffffffff810b7ff5 ffff880859037ea8
>  ffffffff810759ef ffffffff82140220 000000000000000c ffffffff82140220
>  ffff880859034000 ffff880859037ea8 ffffffff8140f1f7 ffff880859037ec8
> Call Trace:
>  [<ffffffff8140f1f7>] ? debug_smp_processor_id+0x17/0x20
>  [<ffffffff810b7ff5>] ? rcu_eqs_enter_common+0x85/0x240
>  [<ffffffff810759ef>] ? __atomic_notifier_call_chain+0x4f/0x70
>  [<ffffffff8140f1f7>] ? debug_smp_processor_id+0x17/0x20
>  [<ffffffff8100dd52>] ? mwait_idle+0x42/0x1d0
>  [<ffffffff8100ea8f>] ? arch_cpu_idle+0xf/0x20
>  [<ffffffff8109688e>] ? cpu_startup_entry+0xae/0x300
>  [<ffffffff8103aabc>] ? start_secondary+0x12c/0x140
> Task dump for CPU 17:
> swapper/17      R  running task        0     0      1 0x00200000
>  ffffffff8140f1f7 ffff880859063eb8 ffffffff810b7ff5 ffff880859063ea8
>  ffffffff810759ef ffffffff82140220 0000000000000011 ffffffff82140220
>  ffff880859060000 ffff880859063ea8 ffffffff8140f1f7 ffff880859063ec8
> Call Trace:
>  [<ffffffff8140f1f7>] ? debug_smp_processor_id+0x17/0x20
>  [<ffffffff810b7ff5>] ? rcu_eqs_enter_common+0x85/0x240
>  [<ffffffff810759ef>] ? __atomic_notifier_call_chain+0x4f/0x70
>  [<ffffffff8140f1f7>] ? debug_smp_processor_id+0x17/0x20
>  [<ffffffff8100dd52>] ? mwait_idle+0x42/0x1d0
>  [<ffffffff8100ea8f>] ? arch_cpu_idle+0xf/0x20
>  [<ffffffff8109688e>] ? cpu_startup_entry+0xae/0x300
>  [<ffffffff8103aabc>] ? start_secondary+0x12c/0x140

The stack traces are consistent with idle CPUs.

> rcu_preempt kthread starved for 21002 jiffies!

And the grace-period kthread apparently hasn't gotten a chance to run
at all during the grace period.  Strange...  It should get awakened by
a timeout every three jiffies or so.

> INFO: rcu_preempt detected stalls on CPUs/tasks:
> 12: (0 ticks this GP) idle=1aa/0/0 softirq=0/0 fqs=1
> (detected by 0, t=84007 jiffies, g=-236, c=-237, q=159)

The same grace period is still stalled after a total of 84 seconds,
and again CPU 17's quiescent state has been reported.

> Task dump for CPU 12:
> swapper/12      R  running task        0     0      1 0x00200000
>  ffffffff8140f1f7 ffff880859037eb8 ffffffff810b7ff5 ffff880859037ea8
>  ffffffff810759ef ffffffff82140220 000000000000000c ffffffff82140220
>  ffff880859034000 ffff880859037ea8 ffffffff8140f1f7 ffff880859037ec8
> Call Trace:
>  [<ffffffff8140f1f7>] ? debug_smp_processor_id+0x17/0x20
>  [<ffffffff810b7ff5>] ? rcu_eqs_enter_common+0x85/0x240
>  [<ffffffff810759ef>] ? __atomic_notifier_call_chain+0x4f/0x70
>  [<ffffffff8140f1f7>] ? debug_smp_processor_id+0x17/0x20
>  [<ffffffff8100dd52>] ? mwait_idle+0x42/0x1d0
>  [<ffffffff8100ea8f>] ? arch_cpu_idle+0xf/0x20
>  [<ffffffff8109688e>] ? cpu_startup_entry+0xae/0x300
>  [<ffffffff8103aabc>] ? start_secondary+0x12c/0x140

And again, the stack is consistent with the CPU being idle.

> rcu_preempt kthread starved for 63005 jiffies!

And again the grace-period kthread seems to have gotten a chance to
run during the first RCU CPU stall warning, but not since then.

> INFO: rcu_preempt detected stalls on CPUs/tasks:
> 12: (0 ticks this GP) idle=20c/0/0 softirq=0/0 fqs=0
> 17: (61 GPs behind) idle=810/0/0 softirq=0/0 fqs=0
> (detected by 40, t=21002 jiffies, g=-235, c=-236, q=162)

And again, it looks like the stall warning gave the grace-period kthread
another chance to run, thus completing the earlier grace period.  We
now have another grace period stalled for 21 seconds, again by CPUs 12
and 17.

> Task dump for CPU 12:
> swapper/12      R  running task        0     0      1 0x00200000
>  ffffffff8140f1f7 ffff880859037eb8 ffffffff810b7ff5 ffff880859037ea8
>  ffffffff810759ef ffffffff82140220 000000000000000c ffffffff82140220
>  ffff880859034000 ffff880859037ea8 ffffffff8140f1f7 ffff880859037ec8
> Call Trace:
>  [<ffffffff8140f1f7>] ? debug_smp_processor_id+0x17/0x20
>  [<ffffffff810b7ff5>] ? rcu_eqs_enter_common+0x85/0x240
>  [<ffffffff810759ef>] ? __atomic_notifier_call_chain+0x4f/0x70
>  [<ffffffff8140f1f7>] ? debug_smp_processor_id+0x17/0x20
>  [<ffffffff8100dd52>] ? mwait_idle+0x42/0x1d0
>  [<ffffffff8100ea8f>] ? arch_cpu_idle+0xf/0x20
>  [<ffffffff8109688e>] ? cpu_startup_entry+0xae/0x300
>  [<ffffffff8103aabc>] ? start_secondary+0x12c/0x140
> Task dump for CPU 17:
> swapper/17      R  running task        0     0      1 0x00200000
>  ffffffff8140f1f7 ffff880859063eb8 ffffffff810b7ff5 ffff880859063ea8
>  ffffffff810759ef ffffffff82140220 0000000000000011 ffffffff82140220
>  ffff880859060000 ffff880859063ea8 ffffffff8140f1f7 ffff880859063ec8
> Call Trace:
>  [<ffffffff8140f1f7>] ? debug_smp_processor_id+0x17/0x20
>  [<ffffffff810b7ff5>] ? rcu_eqs_enter_common+0x85/0x240
>  [<ffffffff810759ef>] ? __atomic_notifier_call_chain+0x4f/0x70
>  [<ffffffff8140f1f7>] ? debug_smp_processor_id+0x17/0x20
>  [<ffffffff8100dd52>] ? mwait_idle+0x42/0x1d0
>  [<ffffffff8100ea8f>] ? arch_cpu_idle+0xf/0x20
>  [<ffffffff8109688e>] ? cpu_startup_entry+0xae/0x300
>  [<ffffffff8103aabc>] ? start_secondary+0x12c/0x140
> rcu_preempt kthread starved for 21002 jiffies!

And again idle stacks, and again the RCU grace-period kthread has not
been allowed to run at all since the grace period stared.

> INFO: rcu_preempt detected stalls on CPUs/tasks:
> 12: (0 ticks this GP) idle=30e/0/0 softirq=0/0 fqs=1
> (detected by 5, t=84007 jiffies, g=-235, c=-236, q=170)
> Task dump for CPU 12:
> swapper/12      R  running task        0     0      1 0x00200000
>  ffffffff8140f1f7 ffff880859037eb8 ffffffff810b7ff5 ffff880859037ea8
>  ffffffff810759ef ffffffff82140220 000000000000000c ffffffff82140220
>  ffff880859034000 ffff880859037ea8 ffffffff8140f1f7 ffff880859037ec8
> Call Trace:
>  [<ffffffff8140f1f7>] ? debug_smp_processor_id+0x17/0x20
>  [<ffffffff810b7ff5>] ? rcu_eqs_enter_common+0x85/0x240
>  [<ffffffff810759ef>] ? __atomic_notifier_call_chain+0x4f/0x70
>  [<ffffffff8140f1f7>] ? debug_smp_processor_id+0x17/0x20
>  [<ffffffff8100dd52>] ? mwait_idle+0x42/0x1d0
>  [<ffffffff8100ea8f>] ? arch_cpu_idle+0xf/0x20
>  [<ffffffff8109688e>] ? cpu_startup_entry+0xae/0x300
>  [<ffffffff8103aabc>] ? start_secondary+0x12c/0x140
> rcu_preempt kthread starved for 63005 jiffies!

And again, CPU 17's grace period has been recorded, and the grace-period
kthread hasn't been allowed to run since the last RCU CPU stall warning.

When a stall warning is printed, there is also a resched_cpu() on the
holdout CPU.  Perhaps that is also giving the grace-period kthread a
chance to run?  Alternatively, perhaps the overhead of all the printing
is letting the grace-period kthread run?

I do sometimes see similar stalls, but under heavy rcutorture load and
when running qemu/kvm.  I am considering this to be a bug (somewhere!) and
am tracking it down, but I suspect that your boot-time stalls are from
something different.  Yours is the only report of this that I have seen
thus far.

So what are you starting up at boot time?

							Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/