linux-kernel - Re: RCU stall and the system boot hang with nfsroot

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALSz7m0yiX382+g4BEoZj4GF3vXsHkrBD-Y5oODM2DJE--vpfA@mail.gmail.com>
Date:	Mon, 4 Jan 2016 18:01:37 +0800
From:	Aaron Ma <mapengyu@...il.com>
To:	Paul McKenney <paulmck@...ux.vnet.ibm.com>
Cc:	linux-rt-users@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: RCU stall and the system boot hang with nfsroot

On Fri, Jan 1, 2016 at 3:49 AM, Paul E. McKenney
<paulmck@...ux.vnet.ibm.com> wrote:
> On Wed, Dec 30, 2015 at 09:41:45AM -0800, Paul E. McKenney wrote:
>> On Wed, Dec 30, 2015 at 03:03:33PM +0800, Aaron Ma wrote:
>> > On Wed, Dec 30, 2015 at 7:42 AM, Paul E. McKenney
>> > <paulmck@...ux.vnet.ibm.com> wrote:
>
> [ . . . ]
>
>> > cfg80211: Calling CRDA to update world regulatory domain
>> > cfg80211: Calling CRDA to update world regulatory domain
>> > cfg80211: Exceeded CRDA call max attempts. Not calling CRDA
>> > INFO: rcu_preempt detected stalls on CPUs/tasks:
>> >     71: (0 ticks this GP) idle=1ac/0/0 softirq=0/0 fqs=0
>> >     (detected by 62, t=26002 jiffies, g=3735, c=3734, q=366014)
>> > Task dump for CPU 71:
>> > swapper/71      R  running task        0     0      1 0x00200000
>> >  ffffffff81492587 ffff8804633cbe58 ffffffff814f21d7 0000000000000004
>> >  0000000000000004 ffffe8fffb405310 ffffffff820dc5c0 ffff8804633cbea8
>> >  ffffffff8181db85 0000000000000000 0000000000000000 0000000000000046
>> > Call Trace:
>> >  [<ffffffff81492587>] ? debug_smp_processor_id+0x17/0x20
>> >  [<ffffffff814f21d7>] ? intel_idle+0x137/0x140
>> >  [<ffffffff8181db85>] ? cpuidle_enter_state+0x65/0x3e0
>> >  [<ffffffff8181df37>] ? cpuidle_enter+0x17/0x20
>> >  [<ffffffff810a849d>] ? cpu_startup_entry+0x33d/0x630
>> >  [<ffffffff8103ceae>] ? start_secondary+0x12e/0x140
>> > rcu_preempt kthread starved for 26002 jiffies!
>> > rcu_check_gp_kthread_starvation --->show task:
>> > rcu_preempt     S ffff880456413c68     0     8      2 0x00000000
>> >  ffff880456413c68 ffff8804564025d0 000000000000d7a0 ffff880456b18000
>> >  ffff8804564025d0 ffff880456413c38 ffffffff81492587 ffff880456413c58
>> >  ffff880456414000 ffff8804564025d0 ffff880456413cb8 ffff880869dce500
>> > Call Trace:
>> >  [<ffffffff81492587>] ? debug_smp_processor_id+0x17/0x20
>> >  [<ffffffff81b5ce9f>] schedule+0x3f/0xd0
>> >  [<ffffffff81b5ef19>] schedule_timeout+0x189/0x3f0
>> >  [<ffffffff810a7904>] ? swait_prepare+0x24/0x90
>> >  [<ffffffff810e8e60>] ? timer_cpu_notify+0x190/0x190
>> >  [<ffffffff810a793b>] ? swait_prepare+0x5b/0x90
>> >  [<ffffffff810de3f8>] rcu_gp_kthread+0x8a8/0x2190
>> >  [<ffffffff810b275d>] ? trace_hardirqs_on+0xd/0x10
>> >  [<ffffffff81b5c18f>] ? __schedule+0x4af/0x1180
>> >  [<ffffffff810ddb50>] ? call_rcu_sched+0x20/0x20
>> >  [<ffffffff8107f844>] kthread+0xe4/0x100
>> >  [<ffffffff810b275d>] ? trace_hardirqs_on+0xd/0x10
>> >  [<ffffffff8107f760>] ? kthread_create_on_node+0x240/0x240
>> >  [<ffffffff81b61562>] ret_from_fork+0x42/0x70
>> >  [<ffffffff8107f760>] ? kthread_create_on_node+0x240/0x240
>> > rcu_check_gp_kthread_starvation --->end
>> >
>> > It seems wait in rcu_gp_kthread. it should be no task blocked right?
>> > If so, why the swait_event_interruptible_timeout is not awaken? the
>> > timeout is CONFIG_HZ=1000.
>>
>> Given that this happens at boot, perhaps ftrace is a good next step.
>> The thought would be to enable ftrace via the kernel boot parameters
>> for the timers.
>>
>> And how often does this problem occur?
>
> And does the following diagnostic patch help?  Its expected behavior
> would be to turn a hard hang into something that recovered in a few
> minutes, while giving a few stall-warning splats.
>
>                                                         Thanx, Paul
>
> ------------------------------------------------------------------------
>
> commit 7798a5efb2acabfa3ca788dd9b5b118eb1bff443
> Author: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
> Date:   Thu Dec 31 08:48:36 2015 -0800
>
>     rcu: Awaken grace-period kthread when stalled
>
>     Recent kernels can fail to awaken the grace-period kthread for
>     quiescent-state forcing.  This commit is a crude hack that does
>     a wakeup any time a stall is detected.
>
>     Signed-off-by: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 4b3de6718f7c..51da7ef3561f 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1225,8 +1225,10 @@ static void rcu_check_gp_kthread_starvation(struct rcu_state *rsp)
>                        rsp->gp_flags,
>                        gp_state_getname(rsp->gp_state), rsp->gp_state,
>                        rsp->gp_kthread ? rsp->gp_kthread->state : ~0);
> -               if (rsp->gp_kthread)
> +               if (rsp->gp_kthread) {
>                         sched_show_task(rsp->gp_kthread);
> +                       wake_up_process(rsp->gp_kthread);
> +               }
>         }
>  }

Sorry for late response because of holiday.

During my knowledge, now this issue is splitted to 2 issues.

First, there is a mistake in the kernel I used.
I used kernel is changed by someone else, and it ported a patch:
Author: Thomas Gleixner <tglx@...utronix.de>
Date:   Tue Apr 14 21:08:54 2015 +0000

    tick: sched: Force tick interrupt and get rid of softirq magic

    commit 0ff53d09642204c648424def0caa9117e7a3caaf upstream

 static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
                                         ktime_t now, int cpu)
 {
@@ -691,22 +705,18 @@ static ktime_t tick_nohz_stop_sched_tick(struct
tick_sched *ts,
                        if (ts->nohz_mode == NOHZ_MODE_HIGHRES)
                                hrtimer_cancel(&ts->sched_timer);
                        goto out;
-               }
+                }

-               if (ts->nohz_mode == NOHZ_MODE_HIGHRES) {
-                       hrtimer_start(&ts->sched_timer, expires,
-                                     HRTIMER_MODE_ABS_PINNED);
-                       goto out;
-               } else if (!tick_program_event(expires, 0))
-                       goto out;
-               /*
-                * We are past the event already. So we crossed a
-                * jiffie boundary. Update jiffies and raise the
-                * softirq.
-                */
-               tick_do_update_jiffies64(ktime_get());
+                if (ts->nohz_mode == NOHZ_MODE_HIGHRES)
+                        hrtimer_start(&ts->sched_timer, expires,
+                                      HRTIMER_MODE_ABS_PINNED);
+                else
+                        tick_program_event(expires, 1);
+       } else {
+               /* Tick is stopped, but required now. Enforce it */
+               tick_nohz_restart(ts, now);
        }
-       raise_softirq_irqoff(TIMER_SOFTIRQ);
+

It remove the "raise_softirq_irqoff" in tick_nohz_stop_sched_tick, so
the kernel will be not waked up in idle status.

Second:
The reason I thought linux-stable-rt has the same issue because of it
has a boot issue during systemd init:
mce: [Hardware Error]: PROCESSOR 0:406f0 TIME 1451543174 SOCKET 1 APIC
40 microcode 10
 0000000000000000 ffff880468a0c420 ffff880468a08f58
Call Trace:
mce: [Hardware Error]: Run the above through 'mcelog --ascii'
mce: [Hardware Error]: CPU 22: Machine Check Exception: 0 Bank 20:
c800008000310e0f
mce: [Hardware Error]: TSC 0 MISC df87b008d9eff
mce: [Hardware Error]: PROCESSOR 0:406f0 TIME 1451543174 SOCKET 1 APIC
40 microcode 10
mce: [Hardware Error]: Run the above through 'mcelog --ascii'
 <IRQ>  [<ffffffff81b56b53>] dump_stack+0x4a/0x61
Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast
exception handler
 [<ffffffff81058a6a>] warn_slowpath_common+0x8a/0xe0
 [<ffffffff81058b7a>] warn_slowpath_null+0x1a/0x20
 [<ffffffff810dafee>] rcu_eqs_exit_common+0x2be/0x550
 [<ffffffff810e32fc>] rcu_irq_enter+0xfc/0x330
 [<ffffffff8105f25e>] irq_enter+0xe/0x60
 [<ffffffff81b63a87>] smp_apic_timer_interrupt+0x37/0x60
 [<ffffffff81b62030>] apic_timer_interrupt+0x70/0x80
 <EOI>  <#MC>  [<ffffffff81b53b2b>] ? wait_for_panic+0x1a/0x54
 [<ffffffff81b53b31>] ? wait_for_panic+0x20/0x54
 [<ffffffff81b53b2b>] ? wait_for_panic+0x1a/0x54
 [<ffffffff8102f62e>] mce_timed_out+0x7e/0x80
 [<ffffffff81030887>] do_machine_check+0x737/0xa50
 [<ffffffff814f2191>] ? intel_idle+0xd1/0x140
 [<ffffffff81b62e1b>] machine_check+0x2b/0x50
 [<ffffffff814f2191>] ? intel_idle+0xd1/0x140
 <<EOE>>  [<ffffffff8181dba5>] cpuidle_enter_state+0x65/0x3e0
 [<ffffffff8181df57>] cpuidle_enter+0x17/0x20
 [<ffffffff810a849d>] cpu_startup_entry+0x33d/0x630
 [<ffffffff8103ceae>] start_secondary+0x12e/0x140
---[ end trace 0000000000000002 ]---

I didn't know what's the problem here, so I thought it is caused by
the same issue.

Now with 1st issue resolved, I am digging the "mce hardware error".
Do you think this is a hardware issue?

Thank you very much for your help.

Pengyu

>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/