lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20141217192753.GS5310@linux.vnet.ibm.com>
Date:	Wed, 17 Dec 2014 11:27:53 -0800
From:	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:	Arun KS <arunks.linux@...il.com>
Cc:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	josh@...htriplett.org, rostedt@...dmis.org,
	mathieu.desnoyers@...icios.com, laijs@...fujitsu.com
Subject: Re: [RCU] kernel hangs in wait_rcu_gp during suspend path

On Tue, Dec 16, 2014 at 11:59:07AM +0530, Arun KS wrote:
> Hello,
> 
> I dig little deeper to understand the situation.
> All other cpus are in idle thread already.
> As per my understanding, for the grace period to end, at-least one of
> the following should happen on all online cpus,
> 
> 1. a context switch.
> 2. user space switch.
> 3. switch to idle thread.

This is the case for rcu_sched, and the other flavors vary a bit.

> In this situation, since all the other cores are already in idle,  non
> of the above are meet on all online cores.
> So grace period is getting extended and never finishes. Below is the
> state of runqueue when the hang happens.
> --------------start------------------------------------
> crash> runq
> CPU 0 [OFFLINE]
> 
> CPU 1 [OFFLINE]
> 
> CPU 2 [OFFLINE]
> 
> CPU 3 [OFFLINE]
> 
> CPU 4 RUNQUEUE: c3192e40
>   CURRENT: PID: 0      TASK: f0874440  COMMAND: "swapper/4"
>   RT PRIO_ARRAY: c3192f20
>      [no tasks queued]
>   CFS RB_ROOT: c3192eb0
>      [no tasks queued]
> 
> CPU 5 RUNQUEUE: c31a0e40
>   CURRENT: PID: 0      TASK: f0874980  COMMAND: "swapper/5"
>   RT PRIO_ARRAY: c31a0f20
>      [no tasks queued]
>   CFS RB_ROOT: c31a0eb0
>      [no tasks queued]
> 
> CPU 6 RUNQUEUE: c31aee40
>   CURRENT: PID: 0      TASK: f0874ec0  COMMAND: "swapper/6"
>   RT PRIO_ARRAY: c31aef20
>      [no tasks queued]
>   CFS RB_ROOT: c31aeeb0
>      [no tasks queued]
> 
> CPU 7 RUNQUEUE: c31bce40
>   CURRENT: PID: 0      TASK: f0875400  COMMAND: "swapper/7"
>   RT PRIO_ARRAY: c31bcf20
>      [no tasks queued]
>   CFS RB_ROOT: c31bceb0
>      [no tasks queued]
> --------------end------------------------------------
> 
> If my understanding is correct the below patch should help, because it
> will expedite grace periods during suspend,
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d1d74d14e98a6be740a6f12456c7d9ad47be9c9c

I believe that we already covered this, but I do suggest that you give
it a try.

> But I wonder why it was not taken to stable trees. Can we take it?
> Appreciate your help.

I have no objection to your taking it, but have you tried it yet?

							Thanx, Paul

> Thanks,
> Arun
> 
> On Mon, Dec 15, 2014 at 10:34 PM, Arun KS <arunks.linux@...il.com> wrote:
> > Hi,
> >
> > Here is the backtrace of the process hanging in wait_rcu_gp,
> >
> > PID: 247    TASK: e16e7380  CPU: 4   COMMAND: "kworker/u16:5"
> >  #0 [<c09fead0>] (__schedule) from [<c09fcab0>]
> >  #1 [<c09fcab0>] (schedule_timeout) from [<c09fe050>]
> >  #2 [<c09fe050>] (wait_for_common) from [<c013b2b4>]
> >  #3 [<c013b2b4>] (wait_rcu_gp) from [<c0142f50>]
> >  #4 [<c0142f50>] (atomic_notifier_chain_unregister) from [<c06b2ab8>]
> >  #5 [<c06b2ab8>] (cpufreq_interactive_disable_sched_input) from [<c06b32a8>]
> >  #6 [<c06b32a8>] (cpufreq_governor_interactive) from [<c06abbf8>]
> >  #7 [<c06abbf8>] (__cpufreq_governor) from [<c06ae474>]
> >  #8 [<c06ae474>] (__cpufreq_remove_dev_finish) from [<c06ae8c0>]
> >  #9 [<c06ae8c0>] (cpufreq_cpu_callback) from [<c0a0185c>]
> > #10 [<c0a0185c>] (notifier_call_chain) from [<c0121888>]
> > #11 [<c0121888>] (__cpu_notify) from [<c0121a04>]
> > #12 [<c0121a04>] (cpu_notify_nofail) from [<c09ee7f0>]
> > #13 [<c09ee7f0>] (_cpu_down) from [<c0121b70>]
> > #14 [<c0121b70>] (disable_nonboot_cpus) from [<c016788c>]
> > #15 [<c016788c>] (suspend_devices_and_enter) from [<c0167bcc>]
> > #16 [<c0167bcc>] (pm_suspend) from [<c0167d94>]
> > #17 [<c0167d94>] (try_to_suspend) from [<c0138460>]
> > #18 [<c0138460>] (process_one_work) from [<c0138b18>]
> > #19 [<c0138b18>] (worker_thread) from [<c013dc58>]
> > #20 [<c013dc58>] (kthread) from [<c01061b8>]
> >
> > Will this patch helps here,
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d1d74d14e98a6be740a6f12456c7d9ad47be9c9c
> >
> > I couldn't really understand why it got struck in  synchronize_rcu().
> > Please give some pointers to debug this further.
> >
> > Below are the configs enable related to RCU.
> >
> > CONFIG_TREE_PREEMPT_RCU=y
> > CONFIG_PREEMPT_RCU=y
> > CONFIG_RCU_STALL_COMMON=y
> > CONFIG_RCU_FANOUT=32
> > CONFIG_RCU_FANOUT_LEAF=16
> > CONFIG_RCU_FAST_NO_HZ=y
> > CONFIG_RCU_CPU_STALL_TIMEOUT=21
> > CONFIG_RCU_CPU_STALL_VERBOSE=y
> >
> > Kernel version is 3.10.28
> > Architecture is ARM
> >
> > Thanks,
> > Arun
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ