lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 11 Nov 2013 09:26:16 +0800
From:	Fengguang Wu <fengguang.wu@...el.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Michael wang <wangyun@...ux.vnet.ibm.com>,
	Ingo Molnar <mingo@...nel.org>, linux-kernel@...r.kernel.org
Subject: Re: [sched/get_online_cpus] INFO: task swapper/0:1 blocked for more
 than 120 seconds.

Peter,

FYI. Here is another bisect result.

commit 6acce3ef84520537f8a09a12c9ddbe814a584dd2
Author: Peter Zijlstra <peterz@...radead.org>
Date:   Fri Oct 11 14:38:20 2013 +0200

    sched: Remove get_online_cpus() usage
    
    Remove get_online_cpus() usage from the scheduler; there's 4 sites that
    use it:
    
     - sched_init_smp(); where its completely superfluous since we're in
       'early' boot and there simply cannot be any hotplugging.
    
     - sched_getaffinity(); we already take a raw spinlock to protect the
       task cpus_allowed mask, this disables preemption and therefore
       also stabilizes cpu_online_mask as that's modified using
       stop_machine. However switch to active mask for symmetry with
       sched_setaffinity()/set_cpus_allowed_ptr(). We guarantee active
       mask stability by inserting sync_rcu/sched() into _cpu_down.
    
     - sched_setaffinity(); we don't appear to need get_online_cpus()
       either, there's two sites where hotplug appears relevant:
        * cpuset_cpus_allowed(); for the !cpuset case we use possible_mask,
          for the cpuset case we hold task_lock, which is a spinlock and
          thus for mainline disables preemption (might cause pain on RT).
        * set_cpus_allowed_ptr(); Holds all scheduler locks and thus has
          preemption properly disabled; also it already deals with hotplug
          races explicitly where it releases them.
    
     - migrate_swap(); we can make stop_two_cpus() do the heavy lifting for
       us with a little trickery. By adding a sync_sched/rcu() after the
       CPU_DOWN_PREPARE notifier we can provide preempt/rcu guarantees for
       cpu_active_mask. Use these to validate that both our cpus are active
       when queueing the stop work before we queue the stop_machine works
       for take_cpu_down().
    
    Signed-off-by: Peter Zijlstra <peterz@...radead.org>
    Cc: "Srivatsa S. Bhat" <srivatsa.bhat@...ux.vnet.ibm.com>
    Cc: Paul McKenney <paulmck@...ux.vnet.ibm.com>
    Cc: Mel Gorman <mgorman@...e.de>
    Cc: Rik van Riel <riel@...hat.com>
    Cc: Srikar Dronamraju <srikar@...ux.vnet.ibm.com>
    Cc: Andrea Arcangeli <aarcange@...hat.com>
    Cc: Johannes Weiner <hannes@...xchg.org>
    Cc: Linus Torvalds <torvalds@...ux-foundation.org>
    Cc: Andrew Morton <akpm@...ux-foundation.org>
    Cc: Steven Rostedt <rostedt@...dmis.org>
    Cc: Oleg Nesterov <oleg@...hat.com>
    Link: http://lkml.kernel.org/r/20131011123820.GV3081@twins.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar <mingo@...nel.org>

+------------------------------------------------------------------------------------+----+
|                                                                                    |    |
+------------------------------------------------------------------------------------+----+
| good_boots                                                                         | 3  |
| has_kernel_error_warning                                                           | 16 |
| INFO:task_blocked_for_more_than_seconds                                            | 16 |
| INFO:NMI_handler(arch_trigger_all_cpu_backtrace_handler)took_too_long_to_run:msecs | 16 |
| Kernel_panic-not_syncing:hung_task:blocked_tasks                                   | 16 |
+------------------------------------------------------------------------------------+----+

[    1.255438] EDD information not available.
[    1.255873] Unregister pv shared memory for cpu 0
[    1.898155] tsc: Refined TSC clocksource calibration: 2892.980 MHz
[  240.910063] INFO: task swapper/0:1 blocked for more than 120 seconds.
[  240.910691]       Not tainted 3.12.0-next-20131108-10601-gbad570d #3
[  240.911284] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  240.912017] swapper/0       D 0000ad00  5808     1      0 0x00000000
[  240.912638]  cec29dbc 00000092 c1c3b640 0000ad00 20024160 cec29dc8 c1a252c0 00000008
[  240.913462]  c1937d20 c1a252c0 cec17440 00000000 20024160 00000003 00000000 0000ad00
[  240.914364]  c1c6d500 00000000 cec17848 00000000 cec17440 c1077e17 cf082ab4 00000001
[  240.915212] Call Trace:
[  240.915447]  [<c1077e17>] ? mark_held_locks+0x87/0xf0
[  240.915915]  [<c1621a80>] schedule+0x10/0x20
[  240.916328]  [<c1620fd5>] schedule_timeout+0x125/0x180
[  240.916811]  [<c1625a02>] ? _raw_spin_unlock_irq+0x22/0x50
[  240.917325]  [<c16221f4>] ? wait_for_completion+0x94/0xd0
[  240.917825]  [<c107826b>] ? trace_hardirqs_on+0xb/0x10
[  240.918311]  [<c16221fb>] wait_for_completion+0x9b/0xd0
[  240.918807]  [<c1066aa0>] ? wake_up_state+0x10/0x10
[  240.919270]  [<c108af90>] ? __call_rcu.constprop.61+0x240/0x240
[  240.919823]  [<c108873f>] wait_rcu_gp+0x3f/0x60
[  240.920257]  [<c10885a0>] ? rcu_panic+0x20/0x20
[  240.920687]  [<c108cb9f>] synchronize_rcu+0x1f/0x40
[  240.921149]  [<c160b9a9>] _cpu_down+0x129/0x2c0
[  240.921578]  [<c160bbb2>] ? cpu_down+0x12/0x40
[  240.922006]  [<c12bd9ff>] ? lock_device_hotplug+0xf/0x20
[  240.922518]  [<c160bbc4>] cpu_down+0x24/0x40
[  240.922914]  [<c160b197>] _debug_hotplug_cpu+0x77/0xb0
[  240.923406]  [<c12c2c1e>] ? __platform_driver_register+0x3e/0x40
[  240.923965]  [<c19cae19>] ? pci_iommu_alloc+0x5a/0x5a
[  240.924448]  [<c19cae25>] debug_hotplug_cpu+0xc/0x10
[  240.924932]  [<c19c6b88>] do_one_initcall+0x8d/0x12c
[  240.925405]  [<c117129f>] ? strlen+0xf/0x20
[  240.925813]  [<c1059c93>] ? parameq+0x13/0x70
[  240.926230]  [<c19c6512>] ? repair_env_string+0x12/0x51
[  240.926722]  [<c1059ebf>] ? parse_args+0x1cf/0x320
[  240.927179]  [<c1052b6e>] ? __usermodehelper_set_disable_depth+0x3e/0x50
[  240.927802]  [<c19c6d12>] kernel_init_freeable+0xeb/0x17d
[  240.928310]  [<c19c6500>] ? do_early_param+0x78/0x78
[  240.928789]  [<c160b02b>] kernel_init+0xb/0x100
[  240.929222]  [<c1626cb7>] ret_from_kernel_thread+0x1b/0x28
[  240.929738]  [<c160b020>] ? rest_init+0xc0/0xc0
[  240.930168] 3 locks held by swapper/0/1:
[  240.930541]  #0:  (device_hotplug_lock){+.+.+.}, at: [<c12bd9ff>] lock_device_hotplug+0xf/0x20
[  240.931405]  #1:  (cpu_add_remove_lock){+.+.+.}, at: [<c160bbb2>] cpu_down+0x12/0x40
[  240.932198]  #2:  (cpu_hotplug.lock){+.+.+.}, at: [<c104242d>] cpu_hotplug_begin+0x1d/0x40
[  240.933039] sending NMI to all CPUs:
[  240.933381] NMI backtrace for cpu 1
[  240.933723] CPU: 1 PID: 46 Comm: khungtaskd Not tainted 3.12.0-next-20131108-10601-gbad570d #3
[  240.934031] task: cd645560 ti: cd66a000 task.ti: cd66a000
[  240.934031] EIP: 0060:[<c102c8e5>] EFLAGS: 00010046 CPU: 1
[  240.934031] EIP is at default_send_IPI_mask_logical+0x85/0xf0
[  240.934031] EAX: 00000c00 EBX: 03000000 ECX: fffff000 EDX: fffff000
[  240.934031] ESI: 00000206 EDI: 00000002 EBP: cd66bed4 ESP: cd66bec4
[  240.934031]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[  240.934031] CR0: 8005003b CR2: ffffffff CR3: 01a2d000 CR4: 000006f0
[  240.934031] Stack:
[  240.934031]  00000800 00002710 00007fff 000003ff cd66bee0 c102ca18 00002710 cd66bef0
[  240.934031]  c102cc0d c1861994 cec17440 cd66bf30 c10a4632 c1871278 c1a37f6c c1925686
[  240.934031]  00000002 c19256c7 00000078 00000002 cec17534 00007fff cec17440 cec17534
[  240.934031] Call Trace:
[  240.934031]  [<c102ca18>] default_send_IPI_all+0x58/0x60
[  240.934031]  [<c102cc0d>] arch_trigger_all_cpu_backtrace+0x3d/0x70
[  240.934031]  [<c10a4632>] watchdog+0x2e2/0x300
[  240.934031]  [<c10a4350>] ? hung_task_panic+0x20/0x20
[  240.934031]  [<c105b6d7>] kthread+0xa7/0xc0
[  240.934031]  [<c1060000>] ? set_create_files_as+0x10/0x20
[  240.934031]  [<c1626cb7>] ret_from_kernel_thread+0x1b/0x28
[  240.934031]  [<c105b630>] ? __kthread_parkme+0x60/0x60
[  240.934031] Code: 00 b3 ff ff f6 c4 10 75 ed c1 e3 18 89 99 10 b3 ff ff 89 f8 09 d0 80 ce 04 83 ff 02 0f 44 c2 8b 15 04 06 93 c1 89 82 00 b3 ff ff <f7> c6 00 02 00 00 74 1b e8 6e b9 04 00 89 f0 50 9d 8d 74 26 00

git bisect start e1e958029781a5f22a47807f9648ab9d221746f4 v3.12 --
git bisect good 513e72b3ae0e7d35b09b72c360e548781b345fe7  # 18:57     25+      0  Merge remote-tracking branch 'pm/linux-next'
git bisect good a73f39d6bb13cd0eadf566e202dbfd114de2b9bc  # 19:09     25+      0  Merge remote-tracking branch 'virtio/virtio-next'
git bisect good 8d31e8536540635a1d56c9cd4a30eefe7f236644  # 19:23     25+      0  Merge remote-tracking branch 'spi/for-next'
git bisect  bad 47b8d9c8c38a956094e7f6b231371331dd2dd0d9  # 19:42      8-     10  Merge branch 'x86/hyperv'
git bisect good 8a4d0b56b031455adcbe4a9383c3b497456fcfac  # 20:00     28+      0  Merge branch 'uprobes/core' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc into perf/core
git bisect  bad c90423d1de12fbeaf0c898e1db0e962de347302b  # 20:17      8-      3  Merge branch 'sched/core' into core/locking, to prepare the kernel/locking/ file move
git bisect good 90572890d202527c366aa9489b32404e88a7c020  # 20:36     37+      0  mm: numa: Change page last {nid,pid} into {cpu,pid}
git bisect  bad b4145872f7049e429718b40b86e1b46659988398  # 20:53     15-     10  sched: Move wait code from core.c to wait.c
git bisect good 2739d3eef3a93a92c366a3a0bb85a0afe09e8b8c  # 21:08     37+      0  sched/numa: Retry task_numa_migrate() periodically
git bisect  bad 92ec11809565cf6429c75204e99e0f583b5c9d7c  # 21:17     15-     22  sched/wait: Fix build breakage
git bisect good ed1b7732868035990f07aeb532b1d86272ea909e  # 21:33     37+      0  sched/fair: Fix trivial typos in comments
git bisect  bad 6acce3ef84520537f8a09a12c9ddbe814a584dd2  # 21:40     24-     11  sched: Remove get_online_cpus() usage
git bisect good 746023159c40c523b08a3bc3d213dac212385895  # 21:47     41+      0  sched: Fix race in migrate_swap_stop()
git bisect good 746023159c40c523b08a3bc3d213dac212385895  # 21:51    123+      0  sched: Fix race in migrate_swap_stop()
git bisect  bad bad570d77024cba808126d68a2072f6f8ce64c27  # 21:52      0-     16  Add linux-next specific files for 20131108
git bisect good 93e7237f5344949f4fcb395d484ae89fca3c630b  # 22:11    123+      0  Revert "sched: Remove get_online_cpus() usage"
git bisect good 6c86ae2928f9e4cbf0d5844f5fcfd549e3450b8c  # 22:24    123+      0  Merge tag 'ftrace-urgent-3.12-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
git bisect  bad bad570d77024cba808126d68a2072f6f8ce64c27  # 22:24      0-     16  Add linux-next specific files for 20131108

Thanks,
Fengguang

View attachment "dmesg-yocto-xian-35:20131110140722:i386-randconfig-j3-11101308:3.12.0-next-20131108-10601-gbad570d:3" of type "text/plain" (47357 bytes)

Download attachment "bisect-bad570d77024cba808126d68a2072f6f8ce64c27-i386-randconfig-j3-11101308-INFO:-task-swapper-42000.log" of type "application/octet-stream" (71704 bytes)

View attachment "config-3.12.0-next-20131108-10601-gbad570d" of type "text/plain" (71598 bytes)

_______________________________________________
LKP mailing list
LKP@...ux.intel.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ