lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20131110101612.GC21916@localhost>
Date:	Sun, 10 Nov 2013 18:16:12 +0800
From:	Fengguang Wu <fengguang.wu@...el.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Ingo Molnar <mingo@...nel.org>, linux-kernel@...r.kernel.org,
	Michael wang <wangyun@...ux.vnet.ibm.com>
Subject: [sched/get_online_cpus] INFO: task swapper/0:1 blocked for more than
 120 seconds.

Greetings,

I got the below dmesg and the first bad commit is

commit 6acce3ef84520537f8a09a12c9ddbe814a584dd2
Author: Peter Zijlstra <peterz@...radead.org>
Date:   Fri Oct 11 14:38:20 2013 +0200

    sched: Remove get_online_cpus() usage
    
    Remove get_online_cpus() usage from the scheduler; there's 4 sites that
    use it:
    
     - sched_init_smp(); where its completely superfluous since we're in
       'early' boot and there simply cannot be any hotplugging.
    
     - sched_getaffinity(); we already take a raw spinlock to protect the
       task cpus_allowed mask, this disables preemption and therefore
       also stabilizes cpu_online_mask as that's modified using
       stop_machine. However switch to active mask for symmetry with
       sched_setaffinity()/set_cpus_allowed_ptr(). We guarantee active
       mask stability by inserting sync_rcu/sched() into _cpu_down.
    
     - sched_setaffinity(); we don't appear to need get_online_cpus()
       either, there's two sites where hotplug appears relevant:
        * cpuset_cpus_allowed(); for the !cpuset case we use possible_mask,
          for the cpuset case we hold task_lock, which is a spinlock and
          thus for mainline disables preemption (might cause pain on RT).
        * set_cpus_allowed_ptr(); Holds all scheduler locks and thus has
          preemption properly disabled; also it already deals with hotplug
          races explicitly where it releases them.
    
     - migrate_swap(); we can make stop_two_cpus() do the heavy lifting for
       us with a little trickery. By adding a sync_sched/rcu() after the
       CPU_DOWN_PREPARE notifier we can provide preempt/rcu guarantees for
       cpu_active_mask. Use these to validate that both our cpus are active
       when queueing the stop work before we queue the stop_machine works
       for take_cpu_down().
    
    Signed-off-by: Peter Zijlstra <peterz@...radead.org>
    Cc: "Srivatsa S. Bhat" <srivatsa.bhat@...ux.vnet.ibm.com>
    Cc: Paul McKenney <paulmck@...ux.vnet.ibm.com>
    Cc: Mel Gorman <mgorman@...e.de>
    Cc: Rik van Riel <riel@...hat.com>
    Cc: Srikar Dronamraju <srikar@...ux.vnet.ibm.com>
    Cc: Andrea Arcangeli <aarcange@...hat.com>
    Cc: Johannes Weiner <hannes@...xchg.org>
    Cc: Linus Torvalds <torvalds@...ux-foundation.org>
    Cc: Andrew Morton <akpm@...ux-foundation.org>
    Cc: Steven Rostedt <rostedt@...dmis.org>
    Cc: Oleg Nesterov <oleg@...hat.com>
    Link: http://lkml.kernel.org/r/20131011123820.GV3081@twins.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar <mingo@...nel.org>

===================================================
PARENT COMMIT NOT CLEAN. LOOK OUT FOR WRONG BISECT!
===================================================

+------------------------------------------------------------------------------------+--------------+--------------+
|                                                                                    | 746023159c40 | 1918664d0b2b |
+------------------------------------------------------------------------------------+--------------+--------------+
| good_boots                                                                         | 594          | 7            |
| has_kernel_error_warning                                                           | 6            | 12           |
| BUG:kernel_early_hang_without_any_printk_output                                    | 6            |              |
| INFO:task_blocked_for_more_than_seconds                                            | 0            | 12           |
| INFO:NMI_handler(arch_trigger_all_cpu_backtrace_handler)took_too_long_to_run:msecs | 0            | 12           |
| Kernel_panic-not_syncing:hung_task:blocked_tasks                                   | 0            | 12           |
+------------------------------------------------------------------------------------+--------------+--------------+

[   10.410377] EDD information not available.
[   10.412866] Unregister pv shared memory for cpu 0
[   10.499606] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input3
[  241.206735] INFO: task swapper/0:1 blocked for more than 120 seconds.
[  241.209244]       Not tainted 3.12.0-02098-g1918664 #98
[  241.211290] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  241.214286] swapper/0       D ffff88000ec9e040     0     1      0 0x00000000
[  241.217142]  ffff88000eca1c10 0000000000000046 ffffffff826aca60 ffff88000ec9e040
[  241.220200]  ffff88000eca1fd8 0000000000012a00 0000000000012a00 ffff88000c9927c0
[  241.223196]  ffff88000ec9e040 ffff88000ec9e040 ffff88000eca1b60 0000000000000002
[  241.226835] Call Trace:
[  241.227813]  [<ffffffff8110e544>] ? __lock_acquire+0x1e74/0x1ef0
[  241.230123]  [<ffffffff8110e544>] ? __lock_acquire+0x1e74/0x1ef0
[  241.232400]  [<ffffffff81d39ea5>] schedule+0x55/0x60
[  241.234345]  [<ffffffff81d38efb>] schedule_timeout+0x2b/0x200
[  241.236533]  [<ffffffff8110a88e>] ? mark_held_locks+0x12e/0x150
[  241.238845]  [<ffffffff81d3e577>] ? _raw_spin_unlock_irq+0x27/0x60
[  241.241356]  [<ffffffff810f753d>] ? get_parent_ip+0xd/0x50
[  241.243790]  [<ffffffff810f7603>] ? preempt_count_sub+0x33/0x50
[  241.246273]  [<ffffffff81d3a9c7>] wait_for_common+0x167/0x1b0
[  241.248715]  [<ffffffff810f9470>] ? try_to_wake_up+0x330/0x330
[  241.251291]  [<ffffffff81124df0>] ? __call_rcu.constprop.42+0x3d0/0x3d0
[  241.254145]  [<ffffffff81d3ab48>] wait_for_completion+0x18/0x20
[  241.256756]  [<ffffffff8111ed16>] wait_rcu_gp+0x46/0x50
[  241.259000]  [<ffffffff8111eb30>] ? ftrace_raw_output_rcu_barrier+0x70/0x70
[  241.262012]  [<ffffffff811263dc>] synchronize_rcu+0x2c/0x30
[  241.264437]  [<ffffffff81d1bd82>] _cpu_down+0x2b2/0x340
[  241.266760]  [<ffffffff81d3afea>] ? mutex_lock_nested+0x3da/0x400
[  241.269311]  [<ffffffff8106c832>] ? cpu_hotplug_driver_lock+0x12/0x20
[  241.272113]  [<ffffffff81d1bec1>] cpu_down+0x31/0x50
[  241.274282]  [<ffffffff81d1aaa7>] _debug_hotplug_cpu+0x47/0xb0
[  241.276853]  [<ffffffff824251f3>] ? pci_iommu_alloc+0x6c/0x6c
[  241.279183]  [<ffffffff82425200>] debug_hotplug_cpu+0xd/0x11
[  241.281679]  [<ffffffff8241ce54>] do_one_initcall+0x7e/0x11b
[  241.284140]  [<ffffffff810e9a1d>] ? parse_args+0x1dd/0x300
[  241.286665]  [<ffffffff8241cfed>] kernel_init_freeable+0xfc/0x18d
[  241.289336]  [<ffffffff8241c7ec>] ? do_early_param+0x8a/0x8a
[  241.291894]  [<ffffffff81d1a8e0>] ? rest_init+0xc0/0xc0
[  241.294233]  [<ffffffff81d1a8e9>] kernel_init+0x9/0x180
[  241.296494]  [<ffffffff81d3f9cc>] ret_from_fork+0x7c/0xb0
[  241.298917]  [<ffffffff81d1a8e0>] ? rest_init+0xc0/0xc0
[  241.303393] 3 locks held by swapper/0/1:
[  241.305121]  #0:  (x86_cpu_hotplug_driver_mutex){+.+.+.}, at: [<ffffffff8106c832>] cpu_hotplug_driver_lock+0x12/0x20
[  241.309967]  #1:  (cpu_add_remove_lock){+.+.+.}, at: [<ffffffff81d1bead>] cpu_down+0x1d/0x50
[  241.314045]  #2:  (cpu_hotplug.lock){+.+.+.}, at: [<ffffffff810cca86>] cpu_hotplug_begin+0x26/0x60
[  241.318328] sending NMI to all CPUs:
[  241.319960] NMI backtrace for cpu 1
[  241.321503] CPU: 1 PID: 29 Comm: khungtaskd Not tainted 3.12.0-02098-g1918664 #98
[  241.321618] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  241.321618] task: ffff88000cf8e780 ti: ffff88000cf90000 task.ti: ffff88000cf90000
[  241.321618] RIP: 0010:[<ffffffff8107377d>]  [<ffffffff8107377d>] flat_send_IPI_mask+0x8d/0xd0
[  241.321618] RSP: 0000:ffff88000cf91db8  EFLAGS: 00010046
[  241.321618] RAX: 0000000003000000 RBX: 0000000000000002 RCX: 0000000000000007
[  241.321618] RDX: 0000000000000c00 RSI: 0000000000000001 RDI: 0000000000000300
[  241.321618] RBP: ffff88000cf91dd8 R08: 0000000000000000 R09: 0000000000000000
[  241.321618] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000246
[  241.321618] R13: 0000000000000c00 R14: 0000000000000003 R15: 00000000000003ff
[  241.321618] FS:  0000000000000000(0000) GS:ffff88000f100000(0000) knlGS:0000000000000000
[  241.321618] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  241.321618] CR2: 00000000ffffffff CR3: 0000000002206000 CR4: 00000000000006a0
[  241.321618] Stack:
[  241.321618]  0000000000002710 ffff88000ec9e040 0000000000007fff ffff88000ec9e040
[  241.321618]  ffff88000cf91de8 ffffffff810737da ffff88000cf91e00 ffffffff81070a3f
[  241.321618]  0000000000000078 ffff88000cf91e68 ffffffff8113e1a8 ffffffff8113df4b
[  241.321618] Call Trace:
[  241.321618]  [<ffffffff810737da>] flat_send_IPI_all+0x1a/0x70
[  241.321618]  [<ffffffff81070a3f>] arch_trigger_all_cpu_backtrace+0x5f/0xb0
[  241.321618]  [<ffffffff8113e1a8>] watchdog+0x2d8/0x3d0
[  241.321618]  [<ffffffff8113df4b>] ? watchdog+0x7b/0x3d0
[  241.321618]  [<ffffffff8113ded0>] ? hung_task_panic+0x20/0x20
[  241.321618]  [<ffffffff810eb431>] kthread+0xd1/0xe0
[  241.321618]  [<ffffffff81d3e577>] ? _raw_spin_unlock_irq+0x27/0x60
[  241.321618]  [<ffffffff810eb360>] ? __kthread_unpark+0x50/0x50
[  241.321618]  [<ffffffff81d3f9cc>] ret_from_fork+0x7c/0xb0
[  241.321618]  [<ffffffff810eb360>] ? __kthread_unpark+0x50/0x50
[  241.321618] Code: e6 10 75 f2 44 89 f0 c1 e0 18 89 04 25 10 b3 57 ff 89 da 44 09 ea 41 81 cd 00 04 00 00 83 fb 02 41 0f 44 d5 89 14 25 00 b3 57 ff <41> f7 c4 00 02 00 00 75 1a 4c 89 e7 57 9d 0f 1f 44 00 00 e8 8b 

git bisect start 1918664d0b2b6b2ad6b180e7c04a42872beb6691 d8524ae9d6f492a9c6db9f4d89c5f9b8782fa2d5 --
git bisect good 4d9218daae2b00379b2bb81eaf4fe5143f372db1  # 11:54     27+      0  Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
git bisect good f05ef3f6f78e6b97f5664a8290f1ac0717b1a1b0  # 12:09     27+      2  kvm tools: Switch to using an enum for disk image types
git bisect good 3784592e0c0f224e399605cfa2d1570f79c0cc29  # 12:17     27+      6  config: fix make kvmconfig
git bisect  bad 9c028f0df729ba4958fe7b142d8521b66ee154d9  # 12:31      6-      4  Merge branch 'sched/core'
git bisect good 04bb2f9475054298f0c67a89ca92cade42d3fe5e  # 12:45     50+      0  sched/numa: Adjust scan rate in task_numa_placement
git bisect good ecf1f014325ba60f4df35edae1a357c67c5d4eb1  # 12:56     50+      3  Merge branch 'core/rcu' into core/locking, to prepare the kernel/locking/ file move
git bisect  bad 01768b42dc97a67b4fb33a2535c49fc1969880df  # 13:08     21-     18  locking: Move the mutex code to kernel/locking/
git bisect  bad c2d816443ef305aba8eaf0bf368f4d3d87494f06  # 13:22      4-      7  sched/wait: Introduce prepare_to_wait_event()
git bisect good ec0ad3d01f99d5e5b56a99a58f7003b99250dc65  # 14:11     50+      9  Merge branch 'core/urgent' into sched/core
git bisect good 7c3f2ab7b844f1a859afbc3d41925e8a0faba5fa  # 14:50     50+      4  sched/rt: Add missing rmb()
git bisect  bad 6acce3ef84520537f8a09a12c9ddbe814a584dd2  # 14:55     13-      3  sched: Remove get_online_cpus() usage
git bisect good 746023159c40c523b08a3bc3d213dac212385895  # 15:33    150+      6  sched: Fix race in migrate_swap_stop()
git bisect good 746023159c40c523b08a3bc3d213dac212385895  # 16:04    450+      6  sched: Fix race in migrate_swap_stop()
git bisect  bad 1918664d0b2b6b2ad6b180e7c04a42872beb6691  # 16:04      0-     12  Merge branch 'x86/intel-mid'
git bisect good 1ae1820bfe3f0964cf4bc3c5628055330e3cc538  # 16:47    450+      0  Revert "sched: Remove get_online_cpus() usage"
git bisect good 6c86ae2928f9e4cbf0d5844f5fcfd549e3450b8c  # 17:26    450+     63  Merge tag 'ftrace-urgent-3.12-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
git bisect  bad bad570d77024cba808126d68a2072f6f8ce64c27  # 17:37     32-     16  Add linux-next specific files for 20131108
git bisect  bad 1918664d0b2b6b2ad6b180e7c04a42872beb6691  # 17:37      0-     12  Merge branch 'x86/intel-mid'

Thanks,
Fengguang

View attachment "dmesg-yocto-jaketown-20:20131108223032:x86_64-randconfig-x4-1108:3.12.0-02098-g1918664:98" of type "text/plain" (57763 bytes)

Download attachment "bisect-1918664d0b2b6b2ad6b180e7c04a42872beb6691-x86_64-randconfig-x4-1108-Kernel-panic---not-syncing:-hung_task:-blocked-tasks-112033.log" of type "application/octet-stream" (39959 bytes)

View attachment "config-3.12.0-02098-g1918664" of type "text/plain" (74647 bytes)

_______________________________________________
LKP mailing list
LKP@...ux.intel.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ