lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20180419024658.22xbnw2dkkm7wzjm@wfg-t540p.sh.intel.com>
Date:   Thu, 19 Apr 2018 10:46:58 +0800
From:   Fengguang Wu <fengguang.wu@...el.com>
To:     dri-devel@...ts.freedesktop.org
Cc:     Daniel Vetter <daniel.vetter@...ll.ch>,
        David Airlie <airlied@...ux.ie>,
        Chris Wilson <chris@...is-wilson.co.uk>,
        linux-kernel@...r.kernel.org, lkp@...org
Subject: [test_drm_mm_init] watchdog: BUG: soft lockup - CPU#0 stuck for 22s!
 [swapper/0:1]

Hello,

FYI this happens in mainline kernel 4.17.0-rc1.
It dates back to at least v4.11 .

It occurs in 2 out of 3 boots.

[ 1043.919758] rcu-torture: Free-Block Circulation:  669 668 667 666 662 661 660 659 658 657 0
[ 1109.691598] rcu-torture: rtc: 000000003d6da35c ver: 685 tfle: 0 rta: 685 rtaf: 0 rtf: 674 rtmbe: 0 rtbe: 0 rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 29429 onoff: 0/0:0/0 -1,0:-1,0 0:0 (HZ=250) barrier: 0/0:0 cbflood: 129
[ 1109.739392] rcu-torture: Reader Pipe:  23942278 7 0 0 0 0 0 0 0 0 0
[ 1109.750932] rcu-torture: Reader Batch:  23895152 47133 0 0 0 0 0 0 0 0 0
[ 1109.763056] rcu-torture: Free-Block Circulation:  684 684 683 682 681 680 679 678 677 676 0
[ 1144.200304] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]
[ 1144.229662] irq event stamp: 14402106
[ 1144.229662] hardirqs last  enabled at (14402105): [<ffffffffb1400abd>] restore_regs_and_return_to_kernel+0x0/0x33:
						restore_regs_and_return_to_kernel at arch/x86/entry/entry_64.S:720
[ 1144.245638] hardirqs last disabled at (14402106): [<ffffffffb140098a>] interrupt_entry+0xea/0x120:
						interrupt_entry at arch/x86/entry/entry_64.S:627
[ 1144.261617] softirqs last  enabled at (14390978): [<ffffffffb1600495>] __do_softirq+0x495/0x550
[ 1144.277763] softirqs last disabled at (14390969): [<ffffffffae9304a0>] irq_exit+0xb1/0x29e:
						invoke_softirq at kernel/softirq.c:365
						 (inlined by) irq_exit at kernel/softirq.c:405
[ 1144.285635] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.17.0-rc1 #99
[ 1144.301610] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[ 1144.309648] RIP: 0010:rb_next+0xa3/0x118:
						rb_next at lib/rbtree.c:541
[ 1144.317646] RSP: 0000:ffff88001015f9d8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[ 1144.332748] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000815
[ 1144.341594] RDX: 1ffff920002d9ee8 RSI: 0000000000000816 RDI: ffffc900016cf740
[ 1144.349636] RBP: ffff88001015f9f8 R08: 0000000000001b15 R09: 0000000000000000
[ 1144.365611] R10: 0000000000000000 R11: 00000000d9de409f R12: ffffc900016cf738
[ 1144.373584] R13: ffff88001015fc70 R14: ffff88001015faf8 R15: ffffc900017cce70
[ 1144.381688] FS:  0000000000000000(0000) GS:ffff880010600000(0000) knlGS:0000000000000000
[ 1144.397596] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1144.405598] CR2: 0000000000000000 CR3: 0000000015814001 CR4: 00000000000206b0
[ 1144.413673] Call Trace:
[ 1144.421640]  drm_mm_insert_node_in_range+0x3c7/0x917:
						next_hole at drivers/gpu/drm/drm_mm.c:358 (discriminator 1)
						 (inlined by) drm_mm_insert_node_in_range at drivers/gpu/drm/drm_mm.c:465 (discriminator 1)
[ 1144.429639]  ? drm_mm_reserve_node+0x4c7/0x4c7:
						drm_mm_insert_node_in_range at drivers/gpu/drm/drm_mm.c:451
[ 1144.437652]  ? rcu_dynticks_curr_cpu_in_eqs+0x35/0x79:
						__read_once_size at include/linux/compiler.h:188
						 (inlined by) arch_atomic_read at arch/x86/include/asm/atomic.h:31
						 (inlined by) atomic_read at include/asm-generic/atomic-instrumented.h:22
						 (inlined by) rcu_dynticks_curr_cpu_in_eqs at kernel/rcu/tree.c:349
[ 1144.445752]  ? retint_kernel+0x10/0x10:
						restore_regs_and_return_to_kernel at arch/x86/entry/entry_64.S:720
[ 1144.445752]  expect_insert+0x6d/0x118:
						drm_mm_insert_node_generic at include/drm/drm_mm.h:403
						 (inlined by) expect_insert at drivers/gpu/drm/selftests/test-drm_mm.c:530
[ 1144.457609]  igt_color_evict+0x1fa/0x7bb:
						igt_color_evict at drivers/gpu/drm/selftests/test-drm_mm.c:2113
[ 1144.457609]  ? igt_color_evict_range+0x7b3/0x7b3:
						igt_color_evict at drivers/gpu/drm/selftests/test-drm_mm.c:2083
[ 1144.465551]  ? igt_color+0xa16/0xa64:
						igt_color at drivers/gpu/drm/selftests/test-drm_mm.c:1857
[ 1144.473560]  ? igt_init+0x299/0x299:
						igt_color at drivers/gpu/drm/selftests/test-drm_mm.c:1857
[ 1144.481561]  ? drm_dp_aux_dev_exit+0x40/0x40:
						separate_adjacent_colors at drivers/gpu/drm/selftests/test-drm_mm.c:1833
[ 1144.489569]  ? kasan_check_read+0x1e/0x28:
						kasan_check_read at mm/kasan/kasan.c:272
[ 1144.497602]  ? next_prime_number+0x3e5/0x3fa:
						rcu_lock_release at include/linux/rcupdate.h:251
						 (inlined by) rcu_read_unlock at include/linux/rcupdate.h:688
						 (inlined by) next_prime_number at lib/prime_numbers.c:203
[ 1144.504733]  test_drm_mm_init+0x2ad/0x3b8:
						run_selftests at drivers/gpu/drm/selftests/drm_selftest.c:76
						 (inlined by) test_drm_mm_init at drivers/gpu/drm/selftests/test-drm_mm.c:2294
[ 1144.513642]  ? drm_fb_helper_modinit+0x37/0x37:
						test_drm_mm_init at drivers/gpu/drm/selftests/test-drm_mm.c:2286
[ 1144.521580]  do_one_initcall+0x188/0x3ef:
						do_one_initcall at init/main.c:883
[ 1144.528710]  ? start_kernel+0x14ca/0x14ca:
						do_one_initcall at init/main.c:874
[ 1144.529615]  ? reacquire_held_locks+0x29f/0x29f:
						lock_release at kernel/locking/lockdep.c:3929
[ 1144.537620]  kernel_init_freeable+0x504/0x68d:
						do_initcall_level at init/main.c:951
						 (inlined by) do_initcalls at init/main.c:959
						 (inlined by) do_basic_setup at init/main.c:977
						 (inlined by) kernel_init_freeable at init/main.c:1127
[ 1144.545629]  ? rest_init+0x37d/0x37d:
						kernel_init at init/main.c:1050
[ 1144.553601]  kernel_init+0x13/0x1c6:
						kernel_init at init/main.c:1053
[ 1144.553601]  ? rest_init+0x37d/0x37d:
						kernel_init at init/main.c:1050
[ 1144.561661]  ret_from_fork+0x3a/0x50:
						ret_from_fork at arch/x86/entry/entry_64.S:418
[ 1144.569633] Code: 8d 7b 10 48 89 f8 48 c1 e8 03 42 80 3c 20 00 74 05 e8 60 3f a0 fd 48 8b 43 10 48 85 c0 74 75 48 ff 05 08 32 ba 05 48 89 c3 eb d4 <48> ff 05 04 32 ba 05 49 be 00 00 00 00 00 fc ff df 4c 89 e0 48
[ 1144.601626] Kernel panic - not syncing: softlockup: hung tasks
[ 1144.609656] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G             L    4.17.0-rc1 #99
[ 1144.617569] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[ 1144.633606] Call Trace:
[ 1144.640711]  <IRQ>
[ 1144.641612]  dump_stack+0x10f/0x19b:
						__dump_stack at lib/dump_stack.c:77
						 (inlined by) dump_stack at lib/dump_stack.c:113
[ 1144.649590]  panic+0x20d/0x46b:
						panic at kernel/panic.c:195
[ 1144.649590]  ? refcount_error_report+0x268/0x268:
						panic at kernel/panic.c:136
[ 1144.657617]  ? kasan_check_write+0x22/0x2b:
						kasan_check_write at mm/kasan/kasan.c:279
[ 1144.673626]  ? do_raw_spin_unlock+0x74/0x1fa:
						__read_once_size at include/linux/compiler.h:188
						 (inlined by) arch_atomic_read at arch/x86/include/asm/atomic.h:31
						 (inlined by) atomic_read at include/asm-generic/atomic-instrumented.h:22
						 (inlined by) queued_spin_is_locked at include/asm-generic/qspinlock.h:36
						 (inlined by) debug_spin_unlock at kernel/locking/spinlock_debug.c:98
						 (inlined by) do_raw_spin_unlock at kernel/locking/spinlock_debug.c:134
[ 1144.681566]  ? do_raw_spin_unlock+0x74/0x1fa:
						__read_once_size at include/linux/compiler.h:188
						 (inlined by) arch_atomic_read at arch/x86/include/asm/atomic.h:31
						 (inlined by) atomic_read at include/asm-generic/atomic-instrumented.h:22
						 (inlined by) queued_spin_is_locked at include/asm-generic/qspinlock.h:36
						 (inlined by) debug_spin_unlock at kernel/locking/spinlock_debug.c:98
						 (inlined by) do_raw_spin_unlock at kernel/locking/spinlock_debug.c:134
[ 1144.689539]  watchdog_timer_fn+0x335/0x366:
						watchdog_timer_fn at kernel/watchdog.c:444
[ 1144.689539]  __hrtimer_run_queues+0x2a9/0x3f8:
						__run_hrtimer at kernel/time/hrtimer.c:1388
						 (inlined by) __hrtimer_run_queues at kernel/time/hrtimer.c:1448
[ 1144.697609]  ? watchdog+0x29/0x29:
						watchdog_timer_fn at kernel/watchdog.c:340
[ 1144.705614]  ? __remove_hrtimer+0x296/0x296:
						__hrtimer_run_queues at kernel/time/hrtimer.c:1418
[ 1144.713590]  ? ktime_get_update_offsets_now+0x255/0x44d:
						timekeeping_get_delta at kernel/time/timekeeping.c:229
						 (inlined by) timekeeping_get_ns at kernel/time/timekeeping.c:367
						 (inlined by) ktime_get_update_offsets_now at kernel/time/timekeeping.c:2218
[ 1144.721656]  ? kasan_check_write+0x22/0x2b:
						kasan_check_write at mm/kasan/kasan.c:279
[ 1144.729594]  hrtimer_run_queues+0x1be/0x1ef:
						hrtimer_run_queues at kernel/time/hrtimer.c:1621
[ 1144.737601]  run_local_timers+0x29/0xd7:
						run_local_timers at kernel/time/timer.c:1706
[ 1144.737601]  update_process_times+0x30/0x87:
						update_process_times at kernel/time/timer.c:1636
[ 1144.745600]  tick_nohz_handler+0x338/0x40b:
						tick_sched_handle at kernel/time/tick-sched.c:174
						 (inlined by) tick_nohz_handler at kernel/time/tick-sched.c:1182
[ 1144.753609]  smp_apic_timer_interrupt+0x103/0x12a:
						local_apic_timer_interrupt at arch/x86/kernel/apic/apic.c:1025
						 (inlined by) smp_apic_timer_interrupt at arch/x86/kernel/apic/apic.c:1050
[ 1144.761608]  apic_timer_interrupt+0xf/0x20:
						apic_timer_interrupt at arch/x86/entry/entry_64.S:863
[ 1144.769588]  </IRQ>
[ 1144.769588] RIP: 0010:rb_next+0xa3/0x118:
						rb_next at lib/rbtree.c:541
[ 1144.777587] RSP: 0000:ffff88001015f9d8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[ 1144.792716] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000815
[ 1144.801661] RDX: 1ffff920002d9ee8 RSI: 0000000000000816 RDI: ffffc900016cf740
[ 1144.809546] RBP: ffff88001015f9f8 R08: 0000000000001b15 R09: 0000000000000000
[ 1144.825546] R10: 0000000000000000 R11: 00000000d9de409f R12: ffffc900016cf738
[ 1144.833521] R13: ffff88001015fc70 R14: ffff88001015faf8 R15: ffffc900017cce70
[ 1144.857566]  drm_mm_insert_node_in_range+0x3c7/0x917:
						next_hole at drivers/gpu/drm/drm_mm.c:358 (discriminator 1)
						 (inlined by) drm_mm_insert_node_in_range at drivers/gpu/drm/drm_mm.c:465 (discriminator 1)
[ 1144.865544]  ? drm_mm_reserve_node+0x4c7/0x4c7:
						drm_mm_insert_node_in_range at drivers/gpu/drm/drm_mm.c:451
[ 1144.873515]  ? rcu_dynticks_curr_cpu_in_eqs+0x35/0x79:
						__read_once_size at include/linux/compiler.h:188
						 (inlined by) arch_atomic_read at arch/x86/include/asm/atomic.h:31
						 (inlined by) atomic_read at include/asm-generic/atomic-instrumented.h:22
						 (inlined by) rcu_dynticks_curr_cpu_in_eqs at kernel/rcu/tree.c:349
[ 1144.880731]  ? retint_kernel+0x10/0x10:
						restore_regs_and_return_to_kernel at arch/x86/entry/entry_64.S:720
[ 1144.881545]  expect_insert+0x6d/0x118:
						drm_mm_insert_node_generic at include/drm/drm_mm.h:403
						 (inlined by) expect_insert at drivers/gpu/drm/selftests/test-drm_mm.c:530
[ 1144.889525]  igt_color_evict+0x1fa/0x7bb:
						igt_color_evict at drivers/gpu/drm/selftests/test-drm_mm.c:2113
[ 1144.897558]  ? igt_color_evict_range+0x7b3/0x7b3:
						igt_color_evict at drivers/gpu/drm/selftests/test-drm_mm.c:2083
[ 1144.905559]  ? igt_color+0xa16/0xa64:
						igt_color at drivers/gpu/drm/selftests/test-drm_mm.c:1857
[ 1144.912716]  ? igt_init+0x299/0x299:
						igt_color at drivers/gpu/drm/selftests/test-drm_mm.c:1857
[ 1144.913534]  ? drm_dp_aux_dev_exit+0x40/0x40:
						separate_adjacent_colors at drivers/gpu/drm/selftests/test-drm_mm.c:1833
[ 1144.921507]  ? kasan_check_read+0x1e/0x28:
						kasan_check_read at mm/kasan/kasan.c:272
[ 1144.929545]  ? next_prime_number+0x3e5/0x3fa:
						rcu_lock_release at include/linux/rcupdate.h:251
						 (inlined by) rcu_read_unlock at include/linux/rcupdate.h:688
						 (inlined by) next_prime_number at lib/prime_numbers.c:203
[ 1144.937549]  test_drm_mm_init+0x2ad/0x3b8:
						run_selftests at drivers/gpu/drm/selftests/drm_selftest.c:76
						 (inlined by) test_drm_mm_init at drivers/gpu/drm/selftests/test-drm_mm.c:2294
[ 1144.946133]  ? drm_fb_helper_modinit+0x37/0x37:
						test_drm_mm_init at drivers/gpu/drm/selftests/test-drm_mm.c:2286
[ 1144.946973]  do_one_initcall+0x188/0x3ef:
						do_one_initcall at init/main.c:883
[ 1144.946973]  ? start_kernel+0x14ca/0x14ca:
						do_one_initcall at init/main.c:874
[ 1144.959316]  ? reacquire_held_locks+0x29f/0x29f:
						lock_release at kernel/locking/lockdep.c:3929
[ 1144.965543]  kernel_init_freeable+0x504/0x68d:
						do_initcall_level at init/main.c:951
						 (inlined by) do_initcalls at init/main.c:959
						 (inlined by) do_basic_setup at init/main.c:977
						 (inlined by) kernel_init_freeable at init/main.c:1127
[ 1144.973548]  ? rest_init+0x37d/0x37d:
						kernel_init at init/main.c:1050
[ 1144.981577]  kernel_init+0x13/0x1c6:
						kernel_init at init/main.c:1053
[ 1144.981577]  ? rest_init+0x37d/0x37d:
						kernel_init at init/main.c:1050
[ 1144.991308]  ret_from_fork+0x3a/0x50:
						ret_from_fork at arch/x86/entry/entry_64.S:418
[ 1144.997523] Kernel Offset: 0x2d800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

Attached the full dmesg, kconfig and reproduce scripts.

In v4.11, the call trace looks like this.

[  761.821640] rcu-torture: Reader Batch:  0 2 0 0 0 0 0 0 0 0 0
[  761.838883] rcu-torture: Free-Block Circulation:  0 0 0 0 0 0 0 0 0 0 0
[  761.857673] ??? Writer stall state RTWS_STUTTER(8) g6933 c6933 f0x2 ->state 0x1
[  761.880200] rcu_sched: wait state: 1 ->state: 0x1
[  761.894293] rcu_bh: wait state: 1 ->state: 0x1
[  818.124874] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]
[  818.191670] Modules linked in:
[  818.199151] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.11.0 #2
[  818.211527] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[  818.229772] task: ffff880031a48000 task.stack: ffffc90000194000
[  818.242156] RIP: 0010:drm_mm_insert_node_in_range+0x254/0x670
[  818.254147] RSP: 0018:ffffc90000197c10 EFLAGS: 00000213 ORIG_RAX: ffffffffffffff10
[  818.271281] RAX: 0000000000000000 RBX: 000000000000296e RCX: ffffc90000197c70
[  818.285576] RDX: 000000000000296d RSI: 0000000000001f49 RDI: ffffc90001fd6d20
[  818.299937] RBP: ffffc90000197ca0 R08: 00000000000014b6 R09: 0000000000000000
[  818.317851] R10: 0000000000000000 R11: 000000000000296e R12: ffffc90001fd6d20
[  818.332694] R13: 000000000000296d R14: 0000000000000000 R15: 0000000000000000
[  818.348222] FS:  0000000000000000(0000) GS:ffff880033200000(0000) knlGS:0000000000000000
[  818.367047] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  818.379622] CR2: 00007fffc9805b88 CR3: 0000000002211000 CR4: 00000000000006b0
[  818.396438] Call Trace:
[  818.403050]  expect_insert+0x37/0xc0
[  818.413030]  igt_color_evict+0x149/0x530
[  818.422724]  ? drm_dp_aux_dev_exit+0x30/0x30
[  818.434163]  ? do_early_param+0xbb/0xbb
[  818.443424]  test_drm_mm_init+0xab/0x15a
[  818.452118]  ? drm_kms_helper_init+0x1c/0x1c
[  818.460570]  ? do_early_param+0xbb/0xbb
[  818.470427]  do_one_initcall+0x68/0x220
[  818.478688]  ? do_early_param+0xbb/0xbb
[  818.487569]  kernel_init_freeable+0x140/0x200

Thanks,
Fengguang

View attachment "dmesg-vm-lkp-wsx03-quantal-x86_64-28:20180417032459:x86_64-randconfig-s0-04170223:4.17.0-rc1:99" of type "text/plain" (52432 bytes)

View attachment ".config" of type "text/plain" (114525 bytes)

View attachment "job-script" of type "text/plain" (3924 bytes)

View attachment "reproduce-vm-lkp-wsx03-quantal-x86_64-28:20180417032459:x86_64-randconfig-s0-04170223:4.17.0-rc1:99" of type "text/plain" (1857 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ