[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 11 Nov 2013 09:26:16 +0800
From: Fengguang Wu <fengguang.wu@...el.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Michael wang <wangyun@...ux.vnet.ibm.com>,
Ingo Molnar <mingo@...nel.org>, linux-kernel@...r.kernel.org
Subject: Re: [sched/get_online_cpus] INFO: task swapper/0:1 blocked for more
than 120 seconds.
Peter,
FYI. Here is another bisect result.
commit 6acce3ef84520537f8a09a12c9ddbe814a584dd2
Author: Peter Zijlstra <peterz@...radead.org>
Date: Fri Oct 11 14:38:20 2013 +0200
sched: Remove get_online_cpus() usage
Remove get_online_cpus() usage from the scheduler; there's 4 sites that
use it:
- sched_init_smp(); where its completely superfluous since we're in
'early' boot and there simply cannot be any hotplugging.
- sched_getaffinity(); we already take a raw spinlock to protect the
task cpus_allowed mask, this disables preemption and therefore
also stabilizes cpu_online_mask as that's modified using
stop_machine. However switch to active mask for symmetry with
sched_setaffinity()/set_cpus_allowed_ptr(). We guarantee active
mask stability by inserting sync_rcu/sched() into _cpu_down.
- sched_setaffinity(); we don't appear to need get_online_cpus()
either, there's two sites where hotplug appears relevant:
* cpuset_cpus_allowed(); for the !cpuset case we use possible_mask,
for the cpuset case we hold task_lock, which is a spinlock and
thus for mainline disables preemption (might cause pain on RT).
* set_cpus_allowed_ptr(); Holds all scheduler locks and thus has
preemption properly disabled; also it already deals with hotplug
races explicitly where it releases them.
- migrate_swap(); we can make stop_two_cpus() do the heavy lifting for
us with a little trickery. By adding a sync_sched/rcu() after the
CPU_DOWN_PREPARE notifier we can provide preempt/rcu guarantees for
cpu_active_mask. Use these to validate that both our cpus are active
when queueing the stop work before we queue the stop_machine works
for take_cpu_down().
Signed-off-by: Peter Zijlstra <peterz@...radead.org>
Cc: "Srivatsa S. Bhat" <srivatsa.bhat@...ux.vnet.ibm.com>
Cc: Paul McKenney <paulmck@...ux.vnet.ibm.com>
Cc: Mel Gorman <mgorman@...e.de>
Cc: Rik van Riel <riel@...hat.com>
Cc: Srikar Dronamraju <srikar@...ux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@...hat.com>
Cc: Johannes Weiner <hannes@...xchg.org>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>
Cc: Steven Rostedt <rostedt@...dmis.org>
Cc: Oleg Nesterov <oleg@...hat.com>
Link: http://lkml.kernel.org/r/20131011123820.GV3081@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@...nel.org>
+------------------------------------------------------------------------------------+----+
| | |
+------------------------------------------------------------------------------------+----+
| good_boots | 3 |
| has_kernel_error_warning | 16 |
| INFO:task_blocked_for_more_than_seconds | 16 |
| INFO:NMI_handler(arch_trigger_all_cpu_backtrace_handler)took_too_long_to_run:msecs | 16 |
| Kernel_panic-not_syncing:hung_task:blocked_tasks | 16 |
+------------------------------------------------------------------------------------+----+
[ 1.255438] EDD information not available.
[ 1.255873] Unregister pv shared memory for cpu 0
[ 1.898155] tsc: Refined TSC clocksource calibration: 2892.980 MHz
[ 240.910063] INFO: task swapper/0:1 blocked for more than 120 seconds.
[ 240.910691] Not tainted 3.12.0-next-20131108-10601-gbad570d #3
[ 240.911284] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 240.912017] swapper/0 D 0000ad00 5808 1 0 0x00000000
[ 240.912638] cec29dbc 00000092 c1c3b640 0000ad00 20024160 cec29dc8 c1a252c0 00000008
[ 240.913462] c1937d20 c1a252c0 cec17440 00000000 20024160 00000003 00000000 0000ad00
[ 240.914364] c1c6d500 00000000 cec17848 00000000 cec17440 c1077e17 cf082ab4 00000001
[ 240.915212] Call Trace:
[ 240.915447] [<c1077e17>] ? mark_held_locks+0x87/0xf0
[ 240.915915] [<c1621a80>] schedule+0x10/0x20
[ 240.916328] [<c1620fd5>] schedule_timeout+0x125/0x180
[ 240.916811] [<c1625a02>] ? _raw_spin_unlock_irq+0x22/0x50
[ 240.917325] [<c16221f4>] ? wait_for_completion+0x94/0xd0
[ 240.917825] [<c107826b>] ? trace_hardirqs_on+0xb/0x10
[ 240.918311] [<c16221fb>] wait_for_completion+0x9b/0xd0
[ 240.918807] [<c1066aa0>] ? wake_up_state+0x10/0x10
[ 240.919270] [<c108af90>] ? __call_rcu.constprop.61+0x240/0x240
[ 240.919823] [<c108873f>] wait_rcu_gp+0x3f/0x60
[ 240.920257] [<c10885a0>] ? rcu_panic+0x20/0x20
[ 240.920687] [<c108cb9f>] synchronize_rcu+0x1f/0x40
[ 240.921149] [<c160b9a9>] _cpu_down+0x129/0x2c0
[ 240.921578] [<c160bbb2>] ? cpu_down+0x12/0x40
[ 240.922006] [<c12bd9ff>] ? lock_device_hotplug+0xf/0x20
[ 240.922518] [<c160bbc4>] cpu_down+0x24/0x40
[ 240.922914] [<c160b197>] _debug_hotplug_cpu+0x77/0xb0
[ 240.923406] [<c12c2c1e>] ? __platform_driver_register+0x3e/0x40
[ 240.923965] [<c19cae19>] ? pci_iommu_alloc+0x5a/0x5a
[ 240.924448] [<c19cae25>] debug_hotplug_cpu+0xc/0x10
[ 240.924932] [<c19c6b88>] do_one_initcall+0x8d/0x12c
[ 240.925405] [<c117129f>] ? strlen+0xf/0x20
[ 240.925813] [<c1059c93>] ? parameq+0x13/0x70
[ 240.926230] [<c19c6512>] ? repair_env_string+0x12/0x51
[ 240.926722] [<c1059ebf>] ? parse_args+0x1cf/0x320
[ 240.927179] [<c1052b6e>] ? __usermodehelper_set_disable_depth+0x3e/0x50
[ 240.927802] [<c19c6d12>] kernel_init_freeable+0xeb/0x17d
[ 240.928310] [<c19c6500>] ? do_early_param+0x78/0x78
[ 240.928789] [<c160b02b>] kernel_init+0xb/0x100
[ 240.929222] [<c1626cb7>] ret_from_kernel_thread+0x1b/0x28
[ 240.929738] [<c160b020>] ? rest_init+0xc0/0xc0
[ 240.930168] 3 locks held by swapper/0/1:
[ 240.930541] #0: (device_hotplug_lock){+.+.+.}, at: [<c12bd9ff>] lock_device_hotplug+0xf/0x20
[ 240.931405] #1: (cpu_add_remove_lock){+.+.+.}, at: [<c160bbb2>] cpu_down+0x12/0x40
[ 240.932198] #2: (cpu_hotplug.lock){+.+.+.}, at: [<c104242d>] cpu_hotplug_begin+0x1d/0x40
[ 240.933039] sending NMI to all CPUs:
[ 240.933381] NMI backtrace for cpu 1
[ 240.933723] CPU: 1 PID: 46 Comm: khungtaskd Not tainted 3.12.0-next-20131108-10601-gbad570d #3
[ 240.934031] task: cd645560 ti: cd66a000 task.ti: cd66a000
[ 240.934031] EIP: 0060:[<c102c8e5>] EFLAGS: 00010046 CPU: 1
[ 240.934031] EIP is at default_send_IPI_mask_logical+0x85/0xf0
[ 240.934031] EAX: 00000c00 EBX: 03000000 ECX: fffff000 EDX: fffff000
[ 240.934031] ESI: 00000206 EDI: 00000002 EBP: cd66bed4 ESP: cd66bec4
[ 240.934031] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 240.934031] CR0: 8005003b CR2: ffffffff CR3: 01a2d000 CR4: 000006f0
[ 240.934031] Stack:
[ 240.934031] 00000800 00002710 00007fff 000003ff cd66bee0 c102ca18 00002710 cd66bef0
[ 240.934031] c102cc0d c1861994 cec17440 cd66bf30 c10a4632 c1871278 c1a37f6c c1925686
[ 240.934031] 00000002 c19256c7 00000078 00000002 cec17534 00007fff cec17440 cec17534
[ 240.934031] Call Trace:
[ 240.934031] [<c102ca18>] default_send_IPI_all+0x58/0x60
[ 240.934031] [<c102cc0d>] arch_trigger_all_cpu_backtrace+0x3d/0x70
[ 240.934031] [<c10a4632>] watchdog+0x2e2/0x300
[ 240.934031] [<c10a4350>] ? hung_task_panic+0x20/0x20
[ 240.934031] [<c105b6d7>] kthread+0xa7/0xc0
[ 240.934031] [<c1060000>] ? set_create_files_as+0x10/0x20
[ 240.934031] [<c1626cb7>] ret_from_kernel_thread+0x1b/0x28
[ 240.934031] [<c105b630>] ? __kthread_parkme+0x60/0x60
[ 240.934031] Code: 00 b3 ff ff f6 c4 10 75 ed c1 e3 18 89 99 10 b3 ff ff 89 f8 09 d0 80 ce 04 83 ff 02 0f 44 c2 8b 15 04 06 93 c1 89 82 00 b3 ff ff <f7> c6 00 02 00 00 74 1b e8 6e b9 04 00 89 f0 50 9d 8d 74 26 00
git bisect start e1e958029781a5f22a47807f9648ab9d221746f4 v3.12 --
git bisect good 513e72b3ae0e7d35b09b72c360e548781b345fe7 # 18:57 25+ 0 Merge remote-tracking branch 'pm/linux-next'
git bisect good a73f39d6bb13cd0eadf566e202dbfd114de2b9bc # 19:09 25+ 0 Merge remote-tracking branch 'virtio/virtio-next'
git bisect good 8d31e8536540635a1d56c9cd4a30eefe7f236644 # 19:23 25+ 0 Merge remote-tracking branch 'spi/for-next'
git bisect bad 47b8d9c8c38a956094e7f6b231371331dd2dd0d9 # 19:42 8- 10 Merge branch 'x86/hyperv'
git bisect good 8a4d0b56b031455adcbe4a9383c3b497456fcfac # 20:00 28+ 0 Merge branch 'uprobes/core' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc into perf/core
git bisect bad c90423d1de12fbeaf0c898e1db0e962de347302b # 20:17 8- 3 Merge branch 'sched/core' into core/locking, to prepare the kernel/locking/ file move
git bisect good 90572890d202527c366aa9489b32404e88a7c020 # 20:36 37+ 0 mm: numa: Change page last {nid,pid} into {cpu,pid}
git bisect bad b4145872f7049e429718b40b86e1b46659988398 # 20:53 15- 10 sched: Move wait code from core.c to wait.c
git bisect good 2739d3eef3a93a92c366a3a0bb85a0afe09e8b8c # 21:08 37+ 0 sched/numa: Retry task_numa_migrate() periodically
git bisect bad 92ec11809565cf6429c75204e99e0f583b5c9d7c # 21:17 15- 22 sched/wait: Fix build breakage
git bisect good ed1b7732868035990f07aeb532b1d86272ea909e # 21:33 37+ 0 sched/fair: Fix trivial typos in comments
git bisect bad 6acce3ef84520537f8a09a12c9ddbe814a584dd2 # 21:40 24- 11 sched: Remove get_online_cpus() usage
git bisect good 746023159c40c523b08a3bc3d213dac212385895 # 21:47 41+ 0 sched: Fix race in migrate_swap_stop()
git bisect good 746023159c40c523b08a3bc3d213dac212385895 # 21:51 123+ 0 sched: Fix race in migrate_swap_stop()
git bisect bad bad570d77024cba808126d68a2072f6f8ce64c27 # 21:52 0- 16 Add linux-next specific files for 20131108
git bisect good 93e7237f5344949f4fcb395d484ae89fca3c630b # 22:11 123+ 0 Revert "sched: Remove get_online_cpus() usage"
git bisect good 6c86ae2928f9e4cbf0d5844f5fcfd549e3450b8c # 22:24 123+ 0 Merge tag 'ftrace-urgent-3.12-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
git bisect bad bad570d77024cba808126d68a2072f6f8ce64c27 # 22:24 0- 16 Add linux-next specific files for 20131108
Thanks,
Fengguang
View attachment "dmesg-yocto-xian-35:20131110140722:i386-randconfig-j3-11101308:3.12.0-next-20131108-10601-gbad570d:3" of type "text/plain" (47357 bytes)
Download attachment "bisect-bad570d77024cba808126d68a2072f6f8ce64c27-i386-randconfig-j3-11101308-INFO:-task-swapper-42000.log" of type "application/octet-stream" (71704 bytes)
View attachment "config-3.12.0-next-20131108-10601-gbad570d" of type "text/plain" (71598 bytes)
_______________________________________________
LKP mailing list
LKP@...ux.intel.com
Powered by blists - more mailing lists