[<prev] [next>] [day] [month] [year] [list]
Message-ID: <202212150942.84e60db1-yujie.liu@intel.com>
Date: Thu, 15 Dec 2022 11:10:10 +0800
From: kernel test robot <yujie.liu@...el.com>
To: Thomas Gleixner <tglx@...utronix.de>
CC: <oe-lkp@...ts.linux.dev>, <lkp@...el.com>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
Peter Zijlstra <peterz@...radead.org>,
<linux-kernel@...r.kernel.org>
Subject: [rt-devel:linux-5.10.y-rt] [sched/hotplug] 3dc80c2780:
kernel_BUG_at_kernel/sched/core.c
Greetings,
FYI, we noticed kernel_BUG_at_kernel/sched/core.c due to commit (built with gcc-11):
commit: 3dc80c278022ec43b137216ac51e25a9468bf2d7 ("sched/hotplug: Consolidate task migration on CPU unplug")
https://git.kernel.org/cgit/linux/kernel/git/rt/linux-rt-devel.git linux-5.10.y-rt
in testcase: rcutorture
version:
with following parameters:
runtime: 300s
test: cpuhotplug
torture_type: srcu
test-description: rcutorture is rcutorture kernel module load/unload test.
test-url: https://www.kernel.org/doc/Documentation/RCU/torture.txt
on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
[ 99.675800][ T15] ------------[ cut here ]------------
[ 99.677237][ T15] kernel BUG at kernel/sched/core.c:7078!
[ 99.677911][ T15] invalid opcode: 0000 [#1] SMP KASAN PTI
[ 99.678562][ T15] CPU: 1 PID: 15 Comm: migration/1 Not tainted 5.10.0-rc1-00006-g3dc80c278022 #1
[ 99.679692][ T15] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
[ 99.685108][ T15] Stopper: multi_cpu_stop+0x0/0x360 <- 0x0
[ 99.685862][ T15] RIP: 0010:sched_cpu_dying (??:?)
[ 99.686561][ T15] Code: 55 82 01 a8 01 0f 85 7a fe ff ff c6 05 fa 9e 8d 03 01 90 48 c7 c7 e0 b5 06 83 e8 89 fe 81 01 90 0f 0b 90 90 e9 5c fe ff ff 90 <0f> 0b 48 c7 c7 60 c2 d2 83 e8 c2 99 86 01 e8 86 15 56 00 e9 29 ff
All code
========
0: 55 push %rbp
1: 82 (bad)
2: 01 a8 01 0f 85 7a add %ebp,0x7a850f01(%rax)
8: fe (bad)
9: ff (bad)
a: ff c6 inc %esi
c: 05 fa 9e 8d 03 add $0x38d9efa,%eax
11: 01 90 48 c7 c7 e0 add %edx,-0x1f3838b8(%rax)
17: b5 06 mov $0x6,%ch
19: 83 e8 89 sub $0xffffff89,%eax
1c: fe 81 01 90 0f 0b incb 0xb0f9001(%rcx)
22: 90 nop
23: 90 nop
24: e9 5c fe ff ff jmpq 0xfffffffffffffe85
29: 90 nop
2a:* 0f 0b ud2 <-- trapping instruction
2c: 48 c7 c7 60 c2 d2 83 mov $0xffffffff83d2c260,%rdi
33: e8 c2 99 86 01 callq 0x18699fa
38: e8 86 15 56 00 callq 0x5615c3
3d: e9 .byte 0xe9
3e: 29 ff sub %edi,%edi
Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 48 c7 c7 60 c2 d2 83 mov $0xffffffff83d2c260,%rdi
9: e8 c2 99 86 01 callq 0x18699d0
e: e8 86 15 56 00 callq 0x561599
13: e9 .byte 0xe9
14: 29 ff sub %edi,%edi
[ 99.689087][ T15] RSP: 0018:ffffc9000010fcd0 EFLAGS: 00010002
[ 99.689883][ T15] RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff8119a15c
[ 99.690901][ T15] RDX: 1ffff110740fe9e9 RSI: 0000000000000008 RDI: ffff8883a07f4f48
[ 99.691979][ T15] RBP: ffffc9000010fd08 R08: 0000000000000000 R09: ffff88810004d1bf
[ 99.693039][ T15] R10: ffffed1020009a37 R11: 0000000000000001 R12: ffff8883a07f4f00
[ 99.694020][ T15] R13: 0000000000000001 R14: ffff8883a07f4f18 R15: 0000000000000046
[ 99.695022][ T15] FS: 0000000000000000(0000) GS:ffff8883a0600000(0000) knlGS:0000000000000000
[ 99.696197][ T15] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 99.697046][ T15] CR2: 000055f1b8e6c3c0 CR3: 000000011dc36000 CR4: 00000000000406a0
[ 99.698151][ T15] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 99.699155][ T15] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 99.700179][ T15] Call Trace:
[ 99.700616][ T15] ? sched_cpu_wait_empty (??:?)
[ 99.701311][ T15] cpuhp_invoke_callback (cpu.c:?)
[ 99.701905][ T15] ? mp_irqdomain_ioapic_idx (apic_flat_64.c:?)
[ 99.702593][ T15] ? cpuhp_invoke_callback (cpu.c:?)
[ 99.703314][ T15] take_cpu_down (cpu.c:?)
[ 99.703926][ T15] multi_cpu_stop (stop_machine.c:?)
[ 99.704560][ T15] cpu_stopper_thread (stop_machine.c:?)
[ 99.709427][ T15] ? stop_machine_yield+0x10/0x10
[ 99.710079][ T15] ? cpu_stop_queue_two_works (stop_machine.c:?)
[ 99.710794][ T15] ? smpboot_thread_fn (smpboot.c:?)
[ 99.711468][ T15] smpboot_thread_fn (smpboot.c:?)
[ 99.712104][ T15] ? __smpboot_create_thread (smpboot.c:?)
[ 99.712786][ T15] ? __kthread_parkme (kthread.c:?)
[ 99.713400][ T15] ? schedule (??:?)
[ 99.713917][ T15] ? __smpboot_create_thread (smpboot.c:?)
[ 99.714525][ T15] ? __smpboot_create_thread (smpboot.c:?)
[ 99.715202][ T15] kthread (kthread.c:?)
[ 99.715714][ T15] ? kthread_insert_work_sanity_check (kthread.c:?)
[ 99.716541][ T15] ret_from_fork (??:?)
[ 99.717135][ T15] Modules linked in: rcutorture torture bochs_drm drm_vram_helper drm_ttm_helper ttm drm_kms_helper cec cfbfillrect cfbimgblt cfbcopyarea fb_sys_fops syscopyarea sysfillrect input_leds sysimgblt led_class fb i2c_piix4 fbdev rtc_cmos qemu_fw_cfg drm drm_panel_orientation_quirks fuse i2c_core
[ 99.720716][ T15]
[ 99.720719][ T15] ======================================================
[ 99.720721][ T15] WARNING: possible circular locking dependency detected
[ 99.720723][ T15] 5.10.0-rc1-00006-g3dc80c278022 #1 Not tainted
[ 99.720725][ T15] ------------------------------------------------------
[ 99.720727][ T15] migration/1/15 is trying to acquire lock:
[ 99.720729][ T15] ffffffff83d7ff20 (console_owner){-.-.}-{0:0}, at: console_unlock (??:?)
[ 99.720736][ T15]
[ 99.720738][ T15] but task is already holding lock:
[ 99.720740][ T15] ffff8883a07f4f18 (&rq->lock){-.-.}-{2:2}, at: sched_cpu_dying (??:?)
[ 99.720744][ T15]
[ 99.720746][ T15] which lock already depends on the new lock.
[ 99.720747][ T15]
[ 99.720749][ T15] the existing dependency chain (in reverse order) is:
[ 99.720750][ T15]
[ 99.720751][ T15] -> #4 (&rq->lock){-.-.}-{2:2}:
[ 99.720756][ T15] __lock_acquire (lockdep.c:?)
[ 99.720757][ T15] lock_acquire (??:?)
[ 99.720758][ T15] _raw_spin_lock (??:?)
[ 99.720759][ T15] task_fork_fair (fair.c:?)
[ 99.720761][ T15] sched_fork (??:?)
[ 99.720762][ T15] copy_process (fork.c:?)
[ 99.720764][ T15] kernel_clone (??:?)
[ 99.720765][ T15] kernel_thread (??:?)
[ 99.720766][ T15] rest_init (??:?)
[ 99.720768][ T15] start_kernel (??:?)
[ 99.720770][ T15] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:277)
[ 99.720771][ T15]
[ 99.720773][ T15] -> #3 (&p->pi_lock){-.-.}-{2:2}:
[ 99.720779][ T15] __lock_acquire (lockdep.c:?)
[ 99.720780][ T15] lock_acquire (??:?)
[ 99.720782][ T15] _raw_spin_lock_irqsave (??:?)
[ 99.720783][ T15] try_to_wake_up (core.c:?)
[ 99.720785][ T15] __wake_up_common (wait.c:?)
[ 99.720787][ T15] __wake_up_common_lock (wait.c:?)
[ 99.720789][ T15] tty_port_default_wakeup (tty_port.c:?)
[ 99.720790][ T15] serial8250_tx_chars (??:?)
[ 99.720792][ T15] serial8250_handle_irq (??:?)
[ 99.720793][ T15] serial8250_interrupt (8250_core.c:?)
[ 99.720795][ T15] __handle_irq_event_percpu (??:?)
[ 99.720797][ T15] handle_irq_event_percpu (??:?)
[ 99.720799][ T15] handle_irq_event (??:?)
[ 99.720800][ T15] handle_edge_irq (??:?)
[ 99.720802][ T15] asm_call_irq_on_stack (??:?)
[ 99.720804][ T15] common_interrupt (??:?)
[ 99.720805][ T15] asm_common_interrupt (??:?)
[ 99.720807][ T15] default_idle (??:?)
[ 99.720809][ T15] default_idle_call (??:?)
[ 99.720810][ T15] cpuidle_idle_call (idle.c:?)
[ 99.720812][ T15] do_idle (idle.c:?)
[ 99.720813][ T15] cpu_startup_entry (??:?)
[ 99.720815][ T15] start_secondary (smpboot.c:?)
[ 99.720817][ T15] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:277)
[ 99.720818][ T15]
[ 99.720819][ T15] -> #2 (&tty->write_wait){-.-.}-{2:2}:
[ 99.720839][ T15] __lock_acquire (lockdep.c:?)
[ 99.720841][ T15] lock_acquire (??:?)
[ 99.720843][ T15] _raw_spin_lock_irqsave (??:?)
[ 99.720844][ T15] __wake_up_common_lock (wait.c:?)
[ 99.720846][ T15] tty_port_default_wakeup (tty_port.c:?)
[ 99.720848][ T15] serial8250_tx_chars (??:?)
[ 99.720850][ T15] serial8250_handle_irq (??:?)
[ 99.720851][ T15] serial8250_interrupt (8250_core.c:?)
[ 99.720853][ T15] __handle_irq_event_percpu (??:?)
[ 99.720855][ T15] handle_irq_event_percpu (??:?)
[ 99.720856][ T15] handle_irq_event (??:?)
[ 99.720858][ T15] handle_edge_irq (??:?)
[ 99.720860][ T15] asm_call_irq_on_stack (??:?)
[ 99.720861][ T15] common_interrupt (??:?)
[ 99.720863][ T15] asm_common_interrupt (??:?)
[ 99.720865][ T15] default_idle (??:?)
[ 99.720866][ T15] default_idle_call (??:?)
[ 99.720868][ T15] cpuidle_idle_call (idle.c:?)
[ 99.720870][ T15] do_idle (idle.c:?)
[ 99.720872][ T15] cpu_startup_entry (??:?)
[ 99.720874][ T15] start_secondary (smpboot.c:?)
[ 99.720875][ T15] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:277)
[ 99.720877][ T15]
[ 99.720878][ T15] -> #1 (&port_lock_key){-.-.}-{2:2}:
[ 99.720884][ T15] __lock_acquire (lockdep.c:?)
[ 99.720886][ T15] lock_acquire (??:?)
[ 99.720887][ T15] _raw_spin_lock_irqsave (??:?)
[ 99.720889][ T15] serial8250_console_write (??:?)
[ 99.720891][ T15] call_console_drivers+0x237/0x400
[ 99.720893][ T15] console_unlock (??:?)
[ 99.720895][ T15] vprintk_emit (??:?)
[ 99.720897][ T15] printk (??:?)
[ 99.720898][ T15] register_console (??:?)
[ 99.720900][ T15] univ8250_console_init (8250_core.c:?)
[ 99.720902][ T15] console_init (??:?)
[ 99.720903][ T15] start_kernel (??:?)
[ 99.720905][ T15] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:277)
[ 99.720906][ T15]
[ 99.720907][ T15] -> #0 (console_owner){-.-.}-{0:0}:
[ 99.720914][ T15] check_prev_add (lockdep.c:?)
[ 99.720915][ T15] validate_chain (lockdep.c:?)
[ 99.720917][ T15] __lock_acquire (lockdep.c:?)
[ 99.720919][ T15] lock_acquire (??:?)
[ 99.720920][ T15] console_unlock (??:?)
[ 99.720922][ T15] vprintk_emit (??:?)
[ 99.720923][ T15] printk (??:?)
[ 99.720925][ T15] report_bug.cold (bug.c:?)
[ 99.720927][ T15] handle_bug (traps.c:?)
[ 99.720929][ T15] exc_invalid_op (??:?)
[ 99.720930][ T15] asm_exc_invalid_op (??:?)
[ 99.720932][ T15] sched_cpu_dying (??:?)
[ 99.720934][ T15] cpuhp_invoke_callback (cpu.c:?)
[ 99.720935][ T15] take_cpu_down (cpu.c:?)
[ 99.720937][ T15] multi_cpu_stop (stop_machine.c:?)
[ 99.720939][ T15] cpu_stopper_thread (stop_machine.c:?)
[ 99.720941][ T15] smpboot_thread_fn (smpboot.c:?)
[ 99.720942][ T15] kthread (kthread.c:?)
[ 99.720944][ T15] ret_from_fork (??:?)
[ 99.720944][ T15]
[ 99.720946][ T15] other info that might help us debug this:
[ 99.720946][ T15]
[ 99.720948][ T15] Chain exists of:
[ 99.720949][ T15] console_owner --> &p->pi_lock --> &rq->lock
[ 99.720955][ T15]
[ 99.720956][ T15] Possible unsafe locking scenario:
[ 99.720957][ T15]
[ 99.720958][ T15] CPU0 CPU1
[ 99.720960][ T15] ---- ----
[ 99.720961][ T15] lock(&rq->lock);
[ 99.720966][ T15] lock(&p->pi_lock);
[ 99.720970][ T15] lock(&rq->lock);
[ 99.720974][ T15] lock(console_owner);
[ 99.720978][ T15]
[ 99.720980][ T15] *** DEADLOCK ***
[ 99.720981][ T15]
[ 99.720983][ T15] 2 locks held by migration/1/15:
[ 99.720984][ T15] #0: ffff8883a07f4f18 (&rq->lock){-.-.}-{2:2}, at: sched_cpu_dying (??:?)
[ 99.720992][ T15] #1: ffffffff84100560 (console_lock){+.+.}-{0:0}, at: vprintk_emit (??:?)
[ 99.721001][ T15]
[ 99.721002][ T15] stack backtrace:
[ 99.721005][ T15] CPU: 1 PID: 15 Comm: migration/1 Not tainted 5.10.0-rc1-00006-g3dc80c278022 #1
[ 99.721007][ T15] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
[ 99.721009][ T15] Stopper: multi_cpu_stop+0x0/0x360 <- 0x0
[ 99.721011][ T15] Call Trace:
[ 99.721012][ T15] dump_stack (??:?)
[ 99.721014][ T15] check_noncircular (lockdep.c:?)
[ 99.721016][ T15] ? print_circular_bug (lockdep.c:?)
[ 99.721018][ T15] ? add_lock_to_list+0x193/0x370
[ 99.721019][ T15] check_prev_add (lockdep.c:?)
[ 99.721021][ T15] validate_chain (lockdep.c:?)
[ 99.721022][ T15] ? check_prev_add (lockdep.c:?)
[ 99.721024][ T15] ? sched_clock (??:?)
[ 99.721026][ T15] __lock_acquire (lockdep.c:?)
[ 99.721027][ T15] ? sched_clock (??:?)
[ 99.721029][ T15] ? sched_clock_cpu (??:?)
[ 99.721031][ T15] lock_acquire (??:?)
[ 99.721032][ T15] ? console_unlock (??:?)
[ 99.721034][ T15] ? rcu_read_unlock (workqueue.c:?)
If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <yujie.liu@...el.com>
| Link: https://lore.kernel.org/oe-lkp/202212150942.84e60db1-yujie.liu@intel.com
To reproduce:
# build kernel
cd linux
cp config-5.10.0-rc1-00006-g3dc80c278022 .config
make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
cd <mod-install-dir>
find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email
# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.
--
0-DAY CI Kernel Test Service
https://01.org/lkp
View attachment "config-5.10.0-rc1-00006-g3dc80c278022" of type "text/plain" (139884 bytes)
View attachment "job-script" of type "text/plain" (5646 bytes)
Download attachment "dmesg.xz" of type "application/x-xz" (30692 bytes)
View attachment "rcutorture" of type "text/plain" (120386 bytes)
Powered by blists - more mailing lists