[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20180419024658.22xbnw2dkkm7wzjm@wfg-t540p.sh.intel.com>
Date: Thu, 19 Apr 2018 10:46:58 +0800
From: Fengguang Wu <fengguang.wu@...el.com>
To: dri-devel@...ts.freedesktop.org
Cc: Daniel Vetter <daniel.vetter@...ll.ch>,
David Airlie <airlied@...ux.ie>,
Chris Wilson <chris@...is-wilson.co.uk>,
linux-kernel@...r.kernel.org, lkp@...org
Subject: [test_drm_mm_init] watchdog: BUG: soft lockup - CPU#0 stuck for 22s!
[swapper/0:1]
Hello,
FYI this happens in mainline kernel 4.17.0-rc1.
It dates back to at least v4.11 .
It occurs in 2 out of 3 boots.
[ 1043.919758] rcu-torture: Free-Block Circulation: 669 668 667 666 662 661 660 659 658 657 0
[ 1109.691598] rcu-torture: rtc: 000000003d6da35c ver: 685 tfle: 0 rta: 685 rtaf: 0 rtf: 674 rtmbe: 0 rtbe: 0 rtbke: 0 rtbre: 0 rtbf: 0 rtb: 0 nt: 29429 onoff: 0/0:0/0 -1,0:-1,0 0:0 (HZ=250) barrier: 0/0:0 cbflood: 129
[ 1109.739392] rcu-torture: Reader Pipe: 23942278 7 0 0 0 0 0 0 0 0 0
[ 1109.750932] rcu-torture: Reader Batch: 23895152 47133 0 0 0 0 0 0 0 0 0
[ 1109.763056] rcu-torture: Free-Block Circulation: 684 684 683 682 681 680 679 678 677 676 0
[ 1144.200304] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]
[ 1144.229662] irq event stamp: 14402106
[ 1144.229662] hardirqs last enabled at (14402105): [<ffffffffb1400abd>] restore_regs_and_return_to_kernel+0x0/0x33:
restore_regs_and_return_to_kernel at arch/x86/entry/entry_64.S:720
[ 1144.245638] hardirqs last disabled at (14402106): [<ffffffffb140098a>] interrupt_entry+0xea/0x120:
interrupt_entry at arch/x86/entry/entry_64.S:627
[ 1144.261617] softirqs last enabled at (14390978): [<ffffffffb1600495>] __do_softirq+0x495/0x550
[ 1144.277763] softirqs last disabled at (14390969): [<ffffffffae9304a0>] irq_exit+0xb1/0x29e:
invoke_softirq at kernel/softirq.c:365
(inlined by) irq_exit at kernel/softirq.c:405
[ 1144.285635] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.17.0-rc1 #99
[ 1144.301610] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[ 1144.309648] RIP: 0010:rb_next+0xa3/0x118:
rb_next at lib/rbtree.c:541
[ 1144.317646] RSP: 0000:ffff88001015f9d8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[ 1144.332748] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000815
[ 1144.341594] RDX: 1ffff920002d9ee8 RSI: 0000000000000816 RDI: ffffc900016cf740
[ 1144.349636] RBP: ffff88001015f9f8 R08: 0000000000001b15 R09: 0000000000000000
[ 1144.365611] R10: 0000000000000000 R11: 00000000d9de409f R12: ffffc900016cf738
[ 1144.373584] R13: ffff88001015fc70 R14: ffff88001015faf8 R15: ffffc900017cce70
[ 1144.381688] FS: 0000000000000000(0000) GS:ffff880010600000(0000) knlGS:0000000000000000
[ 1144.397596] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1144.405598] CR2: 0000000000000000 CR3: 0000000015814001 CR4: 00000000000206b0
[ 1144.413673] Call Trace:
[ 1144.421640] drm_mm_insert_node_in_range+0x3c7/0x917:
next_hole at drivers/gpu/drm/drm_mm.c:358 (discriminator 1)
(inlined by) drm_mm_insert_node_in_range at drivers/gpu/drm/drm_mm.c:465 (discriminator 1)
[ 1144.429639] ? drm_mm_reserve_node+0x4c7/0x4c7:
drm_mm_insert_node_in_range at drivers/gpu/drm/drm_mm.c:451
[ 1144.437652] ? rcu_dynticks_curr_cpu_in_eqs+0x35/0x79:
__read_once_size at include/linux/compiler.h:188
(inlined by) arch_atomic_read at arch/x86/include/asm/atomic.h:31
(inlined by) atomic_read at include/asm-generic/atomic-instrumented.h:22
(inlined by) rcu_dynticks_curr_cpu_in_eqs at kernel/rcu/tree.c:349
[ 1144.445752] ? retint_kernel+0x10/0x10:
restore_regs_and_return_to_kernel at arch/x86/entry/entry_64.S:720
[ 1144.445752] expect_insert+0x6d/0x118:
drm_mm_insert_node_generic at include/drm/drm_mm.h:403
(inlined by) expect_insert at drivers/gpu/drm/selftests/test-drm_mm.c:530
[ 1144.457609] igt_color_evict+0x1fa/0x7bb:
igt_color_evict at drivers/gpu/drm/selftests/test-drm_mm.c:2113
[ 1144.457609] ? igt_color_evict_range+0x7b3/0x7b3:
igt_color_evict at drivers/gpu/drm/selftests/test-drm_mm.c:2083
[ 1144.465551] ? igt_color+0xa16/0xa64:
igt_color at drivers/gpu/drm/selftests/test-drm_mm.c:1857
[ 1144.473560] ? igt_init+0x299/0x299:
igt_color at drivers/gpu/drm/selftests/test-drm_mm.c:1857
[ 1144.481561] ? drm_dp_aux_dev_exit+0x40/0x40:
separate_adjacent_colors at drivers/gpu/drm/selftests/test-drm_mm.c:1833
[ 1144.489569] ? kasan_check_read+0x1e/0x28:
kasan_check_read at mm/kasan/kasan.c:272
[ 1144.497602] ? next_prime_number+0x3e5/0x3fa:
rcu_lock_release at include/linux/rcupdate.h:251
(inlined by) rcu_read_unlock at include/linux/rcupdate.h:688
(inlined by) next_prime_number at lib/prime_numbers.c:203
[ 1144.504733] test_drm_mm_init+0x2ad/0x3b8:
run_selftests at drivers/gpu/drm/selftests/drm_selftest.c:76
(inlined by) test_drm_mm_init at drivers/gpu/drm/selftests/test-drm_mm.c:2294
[ 1144.513642] ? drm_fb_helper_modinit+0x37/0x37:
test_drm_mm_init at drivers/gpu/drm/selftests/test-drm_mm.c:2286
[ 1144.521580] do_one_initcall+0x188/0x3ef:
do_one_initcall at init/main.c:883
[ 1144.528710] ? start_kernel+0x14ca/0x14ca:
do_one_initcall at init/main.c:874
[ 1144.529615] ? reacquire_held_locks+0x29f/0x29f:
lock_release at kernel/locking/lockdep.c:3929
[ 1144.537620] kernel_init_freeable+0x504/0x68d:
do_initcall_level at init/main.c:951
(inlined by) do_initcalls at init/main.c:959
(inlined by) do_basic_setup at init/main.c:977
(inlined by) kernel_init_freeable at init/main.c:1127
[ 1144.545629] ? rest_init+0x37d/0x37d:
kernel_init at init/main.c:1050
[ 1144.553601] kernel_init+0x13/0x1c6:
kernel_init at init/main.c:1053
[ 1144.553601] ? rest_init+0x37d/0x37d:
kernel_init at init/main.c:1050
[ 1144.561661] ret_from_fork+0x3a/0x50:
ret_from_fork at arch/x86/entry/entry_64.S:418
[ 1144.569633] Code: 8d 7b 10 48 89 f8 48 c1 e8 03 42 80 3c 20 00 74 05 e8 60 3f a0 fd 48 8b 43 10 48 85 c0 74 75 48 ff 05 08 32 ba 05 48 89 c3 eb d4 <48> ff 05 04 32 ba 05 49 be 00 00 00 00 00 fc ff df 4c 89 e0 48
[ 1144.601626] Kernel panic - not syncing: softlockup: hung tasks
[ 1144.609656] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G L 4.17.0-rc1 #99
[ 1144.617569] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[ 1144.633606] Call Trace:
[ 1144.640711] <IRQ>
[ 1144.641612] dump_stack+0x10f/0x19b:
__dump_stack at lib/dump_stack.c:77
(inlined by) dump_stack at lib/dump_stack.c:113
[ 1144.649590] panic+0x20d/0x46b:
panic at kernel/panic.c:195
[ 1144.649590] ? refcount_error_report+0x268/0x268:
panic at kernel/panic.c:136
[ 1144.657617] ? kasan_check_write+0x22/0x2b:
kasan_check_write at mm/kasan/kasan.c:279
[ 1144.673626] ? do_raw_spin_unlock+0x74/0x1fa:
__read_once_size at include/linux/compiler.h:188
(inlined by) arch_atomic_read at arch/x86/include/asm/atomic.h:31
(inlined by) atomic_read at include/asm-generic/atomic-instrumented.h:22
(inlined by) queued_spin_is_locked at include/asm-generic/qspinlock.h:36
(inlined by) debug_spin_unlock at kernel/locking/spinlock_debug.c:98
(inlined by) do_raw_spin_unlock at kernel/locking/spinlock_debug.c:134
[ 1144.681566] ? do_raw_spin_unlock+0x74/0x1fa:
__read_once_size at include/linux/compiler.h:188
(inlined by) arch_atomic_read at arch/x86/include/asm/atomic.h:31
(inlined by) atomic_read at include/asm-generic/atomic-instrumented.h:22
(inlined by) queued_spin_is_locked at include/asm-generic/qspinlock.h:36
(inlined by) debug_spin_unlock at kernel/locking/spinlock_debug.c:98
(inlined by) do_raw_spin_unlock at kernel/locking/spinlock_debug.c:134
[ 1144.689539] watchdog_timer_fn+0x335/0x366:
watchdog_timer_fn at kernel/watchdog.c:444
[ 1144.689539] __hrtimer_run_queues+0x2a9/0x3f8:
__run_hrtimer at kernel/time/hrtimer.c:1388
(inlined by) __hrtimer_run_queues at kernel/time/hrtimer.c:1448
[ 1144.697609] ? watchdog+0x29/0x29:
watchdog_timer_fn at kernel/watchdog.c:340
[ 1144.705614] ? __remove_hrtimer+0x296/0x296:
__hrtimer_run_queues at kernel/time/hrtimer.c:1418
[ 1144.713590] ? ktime_get_update_offsets_now+0x255/0x44d:
timekeeping_get_delta at kernel/time/timekeeping.c:229
(inlined by) timekeeping_get_ns at kernel/time/timekeeping.c:367
(inlined by) ktime_get_update_offsets_now at kernel/time/timekeeping.c:2218
[ 1144.721656] ? kasan_check_write+0x22/0x2b:
kasan_check_write at mm/kasan/kasan.c:279
[ 1144.729594] hrtimer_run_queues+0x1be/0x1ef:
hrtimer_run_queues at kernel/time/hrtimer.c:1621
[ 1144.737601] run_local_timers+0x29/0xd7:
run_local_timers at kernel/time/timer.c:1706
[ 1144.737601] update_process_times+0x30/0x87:
update_process_times at kernel/time/timer.c:1636
[ 1144.745600] tick_nohz_handler+0x338/0x40b:
tick_sched_handle at kernel/time/tick-sched.c:174
(inlined by) tick_nohz_handler at kernel/time/tick-sched.c:1182
[ 1144.753609] smp_apic_timer_interrupt+0x103/0x12a:
local_apic_timer_interrupt at arch/x86/kernel/apic/apic.c:1025
(inlined by) smp_apic_timer_interrupt at arch/x86/kernel/apic/apic.c:1050
[ 1144.761608] apic_timer_interrupt+0xf/0x20:
apic_timer_interrupt at arch/x86/entry/entry_64.S:863
[ 1144.769588] </IRQ>
[ 1144.769588] RIP: 0010:rb_next+0xa3/0x118:
rb_next at lib/rbtree.c:541
[ 1144.777587] RSP: 0000:ffff88001015f9d8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[ 1144.792716] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000815
[ 1144.801661] RDX: 1ffff920002d9ee8 RSI: 0000000000000816 RDI: ffffc900016cf740
[ 1144.809546] RBP: ffff88001015f9f8 R08: 0000000000001b15 R09: 0000000000000000
[ 1144.825546] R10: 0000000000000000 R11: 00000000d9de409f R12: ffffc900016cf738
[ 1144.833521] R13: ffff88001015fc70 R14: ffff88001015faf8 R15: ffffc900017cce70
[ 1144.857566] drm_mm_insert_node_in_range+0x3c7/0x917:
next_hole at drivers/gpu/drm/drm_mm.c:358 (discriminator 1)
(inlined by) drm_mm_insert_node_in_range at drivers/gpu/drm/drm_mm.c:465 (discriminator 1)
[ 1144.865544] ? drm_mm_reserve_node+0x4c7/0x4c7:
drm_mm_insert_node_in_range at drivers/gpu/drm/drm_mm.c:451
[ 1144.873515] ? rcu_dynticks_curr_cpu_in_eqs+0x35/0x79:
__read_once_size at include/linux/compiler.h:188
(inlined by) arch_atomic_read at arch/x86/include/asm/atomic.h:31
(inlined by) atomic_read at include/asm-generic/atomic-instrumented.h:22
(inlined by) rcu_dynticks_curr_cpu_in_eqs at kernel/rcu/tree.c:349
[ 1144.880731] ? retint_kernel+0x10/0x10:
restore_regs_and_return_to_kernel at arch/x86/entry/entry_64.S:720
[ 1144.881545] expect_insert+0x6d/0x118:
drm_mm_insert_node_generic at include/drm/drm_mm.h:403
(inlined by) expect_insert at drivers/gpu/drm/selftests/test-drm_mm.c:530
[ 1144.889525] igt_color_evict+0x1fa/0x7bb:
igt_color_evict at drivers/gpu/drm/selftests/test-drm_mm.c:2113
[ 1144.897558] ? igt_color_evict_range+0x7b3/0x7b3:
igt_color_evict at drivers/gpu/drm/selftests/test-drm_mm.c:2083
[ 1144.905559] ? igt_color+0xa16/0xa64:
igt_color at drivers/gpu/drm/selftests/test-drm_mm.c:1857
[ 1144.912716] ? igt_init+0x299/0x299:
igt_color at drivers/gpu/drm/selftests/test-drm_mm.c:1857
[ 1144.913534] ? drm_dp_aux_dev_exit+0x40/0x40:
separate_adjacent_colors at drivers/gpu/drm/selftests/test-drm_mm.c:1833
[ 1144.921507] ? kasan_check_read+0x1e/0x28:
kasan_check_read at mm/kasan/kasan.c:272
[ 1144.929545] ? next_prime_number+0x3e5/0x3fa:
rcu_lock_release at include/linux/rcupdate.h:251
(inlined by) rcu_read_unlock at include/linux/rcupdate.h:688
(inlined by) next_prime_number at lib/prime_numbers.c:203
[ 1144.937549] test_drm_mm_init+0x2ad/0x3b8:
run_selftests at drivers/gpu/drm/selftests/drm_selftest.c:76
(inlined by) test_drm_mm_init at drivers/gpu/drm/selftests/test-drm_mm.c:2294
[ 1144.946133] ? drm_fb_helper_modinit+0x37/0x37:
test_drm_mm_init at drivers/gpu/drm/selftests/test-drm_mm.c:2286
[ 1144.946973] do_one_initcall+0x188/0x3ef:
do_one_initcall at init/main.c:883
[ 1144.946973] ? start_kernel+0x14ca/0x14ca:
do_one_initcall at init/main.c:874
[ 1144.959316] ? reacquire_held_locks+0x29f/0x29f:
lock_release at kernel/locking/lockdep.c:3929
[ 1144.965543] kernel_init_freeable+0x504/0x68d:
do_initcall_level at init/main.c:951
(inlined by) do_initcalls at init/main.c:959
(inlined by) do_basic_setup at init/main.c:977
(inlined by) kernel_init_freeable at init/main.c:1127
[ 1144.973548] ? rest_init+0x37d/0x37d:
kernel_init at init/main.c:1050
[ 1144.981577] kernel_init+0x13/0x1c6:
kernel_init at init/main.c:1053
[ 1144.981577] ? rest_init+0x37d/0x37d:
kernel_init at init/main.c:1050
[ 1144.991308] ret_from_fork+0x3a/0x50:
ret_from_fork at arch/x86/entry/entry_64.S:418
[ 1144.997523] Kernel Offset: 0x2d800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
Attached the full dmesg, kconfig and reproduce scripts.
In v4.11, the call trace looks like this.
[ 761.821640] rcu-torture: Reader Batch: 0 2 0 0 0 0 0 0 0 0 0
[ 761.838883] rcu-torture: Free-Block Circulation: 0 0 0 0 0 0 0 0 0 0 0
[ 761.857673] ??? Writer stall state RTWS_STUTTER(8) g6933 c6933 f0x2 ->state 0x1
[ 761.880200] rcu_sched: wait state: 1 ->state: 0x1
[ 761.894293] rcu_bh: wait state: 1 ->state: 0x1
[ 818.124874] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]
[ 818.191670] Modules linked in:
[ 818.199151] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.11.0 #2
[ 818.211527] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[ 818.229772] task: ffff880031a48000 task.stack: ffffc90000194000
[ 818.242156] RIP: 0010:drm_mm_insert_node_in_range+0x254/0x670
[ 818.254147] RSP: 0018:ffffc90000197c10 EFLAGS: 00000213 ORIG_RAX: ffffffffffffff10
[ 818.271281] RAX: 0000000000000000 RBX: 000000000000296e RCX: ffffc90000197c70
[ 818.285576] RDX: 000000000000296d RSI: 0000000000001f49 RDI: ffffc90001fd6d20
[ 818.299937] RBP: ffffc90000197ca0 R08: 00000000000014b6 R09: 0000000000000000
[ 818.317851] R10: 0000000000000000 R11: 000000000000296e R12: ffffc90001fd6d20
[ 818.332694] R13: 000000000000296d R14: 0000000000000000 R15: 0000000000000000
[ 818.348222] FS: 0000000000000000(0000) GS:ffff880033200000(0000) knlGS:0000000000000000
[ 818.367047] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 818.379622] CR2: 00007fffc9805b88 CR3: 0000000002211000 CR4: 00000000000006b0
[ 818.396438] Call Trace:
[ 818.403050] expect_insert+0x37/0xc0
[ 818.413030] igt_color_evict+0x149/0x530
[ 818.422724] ? drm_dp_aux_dev_exit+0x30/0x30
[ 818.434163] ? do_early_param+0xbb/0xbb
[ 818.443424] test_drm_mm_init+0xab/0x15a
[ 818.452118] ? drm_kms_helper_init+0x1c/0x1c
[ 818.460570] ? do_early_param+0xbb/0xbb
[ 818.470427] do_one_initcall+0x68/0x220
[ 818.478688] ? do_early_param+0xbb/0xbb
[ 818.487569] kernel_init_freeable+0x140/0x200
Thanks,
Fengguang
View attachment "dmesg-vm-lkp-wsx03-quantal-x86_64-28:20180417032459:x86_64-randconfig-s0-04170223:4.17.0-rc1:99" of type "text/plain" (52432 bytes)
View attachment ".config" of type "text/plain" (114525 bytes)
View attachment "job-script" of type "text/plain" (3924 bytes)
View attachment "reproduce-vm-lkp-wsx03-quantal-x86_64-28:20180417032459:x86_64-randconfig-s0-04170223:4.17.0-rc1:99" of type "text/plain" (1857 bytes)
Powered by blists - more mailing lists