lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 16 Dec 2014 11:59:07 +0530
From:	Arun KS <arunks.linux@...il.com>
To:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Cc:	Paul McKenney <paulmck@...ux.vnet.ibm.com>, josh@...htriplett.org,
	rostedt@...dmis.org, mathieu.desnoyers@...icios.com,
	laijs@...fujitsu.com
Subject: Re: [RCU] kernel hangs in wait_rcu_gp during suspend path

Hello,

I dig little deeper to understand the situation.
All other cpus are in idle thread already.
As per my understanding, for the grace period to end, at-least one of
the following should happen on all online cpus,

1. a context switch.
2. user space switch.
3. switch to idle thread.

In this situation, since all the other cores are already in idle,  non
of the above are meet on all online cores.
So grace period is getting extended and never finishes. Below is the
state of runqueue when the hang happens.
--------------start------------------------------------
crash> runq
CPU 0 [OFFLINE]

CPU 1 [OFFLINE]

CPU 2 [OFFLINE]

CPU 3 [OFFLINE]

CPU 4 RUNQUEUE: c3192e40
  CURRENT: PID: 0      TASK: f0874440  COMMAND: "swapper/4"
  RT PRIO_ARRAY: c3192f20
     [no tasks queued]
  CFS RB_ROOT: c3192eb0
     [no tasks queued]

CPU 5 RUNQUEUE: c31a0e40
  CURRENT: PID: 0      TASK: f0874980  COMMAND: "swapper/5"
  RT PRIO_ARRAY: c31a0f20
     [no tasks queued]
  CFS RB_ROOT: c31a0eb0
     [no tasks queued]

CPU 6 RUNQUEUE: c31aee40
  CURRENT: PID: 0      TASK: f0874ec0  COMMAND: "swapper/6"
  RT PRIO_ARRAY: c31aef20
     [no tasks queued]
  CFS RB_ROOT: c31aeeb0
     [no tasks queued]

CPU 7 RUNQUEUE: c31bce40
  CURRENT: PID: 0      TASK: f0875400  COMMAND: "swapper/7"
  RT PRIO_ARRAY: c31bcf20
     [no tasks queued]
  CFS RB_ROOT: c31bceb0
     [no tasks queued]
--------------end------------------------------------

If my understanding is correct the below patch should help, because it
will expedite grace periods during suspend,
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d1d74d14e98a6be740a6f12456c7d9ad47be9c9c

But I wonder why it was not taken to stable trees. Can we take it?
Appreciate your help.

Thanks,
Arun

On Mon, Dec 15, 2014 at 10:34 PM, Arun KS <arunks.linux@...il.com> wrote:
> Hi,
>
> Here is the backtrace of the process hanging in wait_rcu_gp,
>
> PID: 247    TASK: e16e7380  CPU: 4   COMMAND: "kworker/u16:5"
>  #0 [<c09fead0>] (__schedule) from [<c09fcab0>]
>  #1 [<c09fcab0>] (schedule_timeout) from [<c09fe050>]
>  #2 [<c09fe050>] (wait_for_common) from [<c013b2b4>]
>  #3 [<c013b2b4>] (wait_rcu_gp) from [<c0142f50>]
>  #4 [<c0142f50>] (atomic_notifier_chain_unregister) from [<c06b2ab8>]
>  #5 [<c06b2ab8>] (cpufreq_interactive_disable_sched_input) from [<c06b32a8>]
>  #6 [<c06b32a8>] (cpufreq_governor_interactive) from [<c06abbf8>]
>  #7 [<c06abbf8>] (__cpufreq_governor) from [<c06ae474>]
>  #8 [<c06ae474>] (__cpufreq_remove_dev_finish) from [<c06ae8c0>]
>  #9 [<c06ae8c0>] (cpufreq_cpu_callback) from [<c0a0185c>]
> #10 [<c0a0185c>] (notifier_call_chain) from [<c0121888>]
> #11 [<c0121888>] (__cpu_notify) from [<c0121a04>]
> #12 [<c0121a04>] (cpu_notify_nofail) from [<c09ee7f0>]
> #13 [<c09ee7f0>] (_cpu_down) from [<c0121b70>]
> #14 [<c0121b70>] (disable_nonboot_cpus) from [<c016788c>]
> #15 [<c016788c>] (suspend_devices_and_enter) from [<c0167bcc>]
> #16 [<c0167bcc>] (pm_suspend) from [<c0167d94>]
> #17 [<c0167d94>] (try_to_suspend) from [<c0138460>]
> #18 [<c0138460>] (process_one_work) from [<c0138b18>]
> #19 [<c0138b18>] (worker_thread) from [<c013dc58>]
> #20 [<c013dc58>] (kthread) from [<c01061b8>]
>
> Will this patch helps here,
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=d1d74d14e98a6be740a6f12456c7d9ad47be9c9c
>
> I couldn't really understand why it got struck in  synchronize_rcu().
> Please give some pointers to debug this further.
>
> Below are the configs enable related to RCU.
>
> CONFIG_TREE_PREEMPT_RCU=y
> CONFIG_PREEMPT_RCU=y
> CONFIG_RCU_STALL_COMMON=y
> CONFIG_RCU_FANOUT=32
> CONFIG_RCU_FANOUT_LEAF=16
> CONFIG_RCU_FAST_NO_HZ=y
> CONFIG_RCU_CPU_STALL_TIMEOUT=21
> CONFIG_RCU_CPU_STALL_VERBOSE=y
>
> Kernel version is 3.10.28
> Architecture is ARM
>
> Thanks,
> Arun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ