lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <202212150942.84e60db1-yujie.liu@intel.com>
Date:   Thu, 15 Dec 2022 11:10:10 +0800
From:   kernel test robot <yujie.liu@...el.com>
To:     Thomas Gleixner <tglx@...utronix.de>
CC:     <oe-lkp@...ts.linux.dev>, <lkp@...el.com>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>,
        <linux-kernel@...r.kernel.org>
Subject: [rt-devel:linux-5.10.y-rt] [sched/hotplug] 3dc80c2780:
 kernel_BUG_at_kernel/sched/core.c

Greetings,

FYI, we noticed kernel_BUG_at_kernel/sched/core.c due to commit (built with gcc-11):

commit: 3dc80c278022ec43b137216ac51e25a9468bf2d7 ("sched/hotplug: Consolidate task migration on CPU unplug")
https://git.kernel.org/cgit/linux/kernel/git/rt/linux-rt-devel.git linux-5.10.y-rt

in testcase: rcutorture
version: 
with following parameters:

	runtime: 300s
	test: cpuhotplug
	torture_type: srcu

test-description: rcutorture is rcutorture kernel module load/unload test.
test-url: https://www.kernel.org/doc/Documentation/RCU/torture.txt

on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


[   99.675800][   T15] ------------[ cut here ]------------
[   99.677237][   T15] kernel BUG at kernel/sched/core.c:7078!
[   99.677911][   T15] invalid opcode: 0000 [#1] SMP KASAN PTI
[   99.678562][   T15] CPU: 1 PID: 15 Comm: migration/1 Not tainted 5.10.0-rc1-00006-g3dc80c278022 #1
[   99.679692][   T15] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
[ 99.685108][ T15] Stopper: multi_cpu_stop+0x0/0x360 <- 0x0 
[ 99.685862][ T15] RIP: 0010:sched_cpu_dying (??:?) 
[ 99.686561][ T15] Code: 55 82 01 a8 01 0f 85 7a fe ff ff c6 05 fa 9e 8d 03 01 90 48 c7 c7 e0 b5 06 83 e8 89 fe 81 01 90 0f 0b 90 90 e9 5c fe ff ff 90 <0f> 0b 48 c7 c7 60 c2 d2 83 e8 c2 99 86 01 e8 86 15 56 00 e9 29 ff
All code
========
   0:	55                   	push   %rbp
   1:	82                   	(bad)  
   2:	01 a8 01 0f 85 7a    	add    %ebp,0x7a850f01(%rax)
   8:	fe                   	(bad)  
   9:	ff                   	(bad)  
   a:	ff c6                	inc    %esi
   c:	05 fa 9e 8d 03       	add    $0x38d9efa,%eax
  11:	01 90 48 c7 c7 e0    	add    %edx,-0x1f3838b8(%rax)
  17:	b5 06                	mov    $0x6,%ch
  19:	83 e8 89             	sub    $0xffffff89,%eax
  1c:	fe 81 01 90 0f 0b    	incb   0xb0f9001(%rcx)
  22:	90                   	nop
  23:	90                   	nop
  24:	e9 5c fe ff ff       	jmpq   0xfffffffffffffe85
  29:	90                   	nop
  2a:*	0f 0b                	ud2    		<-- trapping instruction
  2c:	48 c7 c7 60 c2 d2 83 	mov    $0xffffffff83d2c260,%rdi
  33:	e8 c2 99 86 01       	callq  0x18699fa
  38:	e8 86 15 56 00       	callq  0x5615c3
  3d:	e9                   	.byte 0xe9
  3e:	29 ff                	sub    %edi,%edi

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2    
   2:	48 c7 c7 60 c2 d2 83 	mov    $0xffffffff83d2c260,%rdi
   9:	e8 c2 99 86 01       	callq  0x18699d0
   e:	e8 86 15 56 00       	callq  0x561599
  13:	e9                   	.byte 0xe9
  14:	29 ff                	sub    %edi,%edi
[   99.689087][   T15] RSP: 0018:ffffc9000010fcd0 EFLAGS: 00010002
[   99.689883][   T15] RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff8119a15c
[   99.690901][   T15] RDX: 1ffff110740fe9e9 RSI: 0000000000000008 RDI: ffff8883a07f4f48
[   99.691979][   T15] RBP: ffffc9000010fd08 R08: 0000000000000000 R09: ffff88810004d1bf
[   99.693039][   T15] R10: ffffed1020009a37 R11: 0000000000000001 R12: ffff8883a07f4f00
[   99.694020][   T15] R13: 0000000000000001 R14: ffff8883a07f4f18 R15: 0000000000000046
[   99.695022][   T15] FS:  0000000000000000(0000) GS:ffff8883a0600000(0000) knlGS:0000000000000000
[   99.696197][   T15] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   99.697046][   T15] CR2: 000055f1b8e6c3c0 CR3: 000000011dc36000 CR4: 00000000000406a0
[   99.698151][   T15] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   99.699155][   T15] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   99.700179][   T15] Call Trace:
[ 99.700616][ T15] ? sched_cpu_wait_empty (??:?) 
[ 99.701311][ T15] cpuhp_invoke_callback (cpu.c:?) 
[ 99.701905][ T15] ? mp_irqdomain_ioapic_idx (apic_flat_64.c:?) 
[ 99.702593][ T15] ? cpuhp_invoke_callback (cpu.c:?) 
[ 99.703314][ T15] take_cpu_down (cpu.c:?) 
[ 99.703926][ T15] multi_cpu_stop (stop_machine.c:?) 
[ 99.704560][ T15] cpu_stopper_thread (stop_machine.c:?) 
[ 99.709427][ T15] ? stop_machine_yield+0x10/0x10 
[ 99.710079][ T15] ? cpu_stop_queue_two_works (stop_machine.c:?) 
[ 99.710794][ T15] ? smpboot_thread_fn (smpboot.c:?) 
[ 99.711468][ T15] smpboot_thread_fn (smpboot.c:?) 
[ 99.712104][ T15] ? __smpboot_create_thread (smpboot.c:?) 
[ 99.712786][ T15] ? __kthread_parkme (kthread.c:?) 
[ 99.713400][ T15] ? schedule (??:?) 
[ 99.713917][ T15] ? __smpboot_create_thread (smpboot.c:?) 
[ 99.714525][ T15] ? __smpboot_create_thread (smpboot.c:?) 
[ 99.715202][ T15] kthread (kthread.c:?) 
[ 99.715714][ T15] ? kthread_insert_work_sanity_check (kthread.c:?) 
[ 99.716541][ T15] ret_from_fork (??:?) 
[   99.717135][   T15] Modules linked in: rcutorture torture bochs_drm drm_vram_helper drm_ttm_helper ttm drm_kms_helper cec cfbfillrect cfbimgblt cfbcopyarea fb_sys_fops syscopyarea sysfillrect input_leds sysimgblt led_class fb i2c_piix4 fbdev rtc_cmos qemu_fw_cfg drm drm_panel_orientation_quirks fuse i2c_core
[   99.720716][   T15]
[   99.720719][   T15] ======================================================
[   99.720721][   T15] WARNING: possible circular locking dependency detected
[   99.720723][   T15] 5.10.0-rc1-00006-g3dc80c278022 #1 Not tainted
[   99.720725][   T15] ------------------------------------------------------
[   99.720727][   T15] migration/1/15 is trying to acquire lock:
[ 99.720729][ T15] ffffffff83d7ff20 (console_owner){-.-.}-{0:0}, at: console_unlock (??:?) 
[   99.720736][   T15]
[   99.720738][   T15] but task is already holding lock:
[ 99.720740][ T15] ffff8883a07f4f18 (&rq->lock){-.-.}-{2:2}, at: sched_cpu_dying (??:?) 
[   99.720744][   T15]
[   99.720746][   T15] which lock already depends on the new lock.
[   99.720747][   T15]
[   99.720749][   T15] the existing dependency chain (in reverse order) is:
[   99.720750][   T15]
[   99.720751][   T15] -> #4 (&rq->lock){-.-.}-{2:2}:
[ 99.720756][ T15] __lock_acquire (lockdep.c:?) 
[ 99.720757][ T15] lock_acquire (??:?) 
[ 99.720758][ T15] _raw_spin_lock (??:?) 
[ 99.720759][ T15] task_fork_fair (fair.c:?) 
[ 99.720761][ T15] sched_fork (??:?) 
[ 99.720762][ T15] copy_process (fork.c:?) 
[ 99.720764][ T15] kernel_clone (??:?) 
[ 99.720765][ T15] kernel_thread (??:?) 
[ 99.720766][ T15] rest_init (??:?) 
[ 99.720768][ T15] start_kernel (??:?) 
[ 99.720770][ T15] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:277) 
[   99.720771][   T15]
[   99.720773][   T15] -> #3 (&p->pi_lock){-.-.}-{2:2}:
[ 99.720779][ T15] __lock_acquire (lockdep.c:?) 
[ 99.720780][ T15] lock_acquire (??:?) 
[ 99.720782][ T15] _raw_spin_lock_irqsave (??:?) 
[ 99.720783][ T15] try_to_wake_up (core.c:?) 
[ 99.720785][ T15] __wake_up_common (wait.c:?) 
[ 99.720787][ T15] __wake_up_common_lock (wait.c:?) 
[ 99.720789][ T15] tty_port_default_wakeup (tty_port.c:?) 
[ 99.720790][ T15] serial8250_tx_chars (??:?) 
[ 99.720792][ T15] serial8250_handle_irq (??:?) 
[ 99.720793][ T15] serial8250_interrupt (8250_core.c:?) 
[ 99.720795][ T15] __handle_irq_event_percpu (??:?) 
[ 99.720797][ T15] handle_irq_event_percpu (??:?) 
[ 99.720799][ T15] handle_irq_event (??:?) 
[ 99.720800][ T15] handle_edge_irq (??:?) 
[ 99.720802][ T15] asm_call_irq_on_stack (??:?) 
[ 99.720804][ T15] common_interrupt (??:?) 
[ 99.720805][ T15] asm_common_interrupt (??:?) 
[ 99.720807][ T15] default_idle (??:?) 
[ 99.720809][ T15] default_idle_call (??:?) 
[ 99.720810][ T15] cpuidle_idle_call (idle.c:?) 
[ 99.720812][ T15] do_idle (idle.c:?) 
[ 99.720813][ T15] cpu_startup_entry (??:?) 
[ 99.720815][ T15] start_secondary (smpboot.c:?) 
[ 99.720817][ T15] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:277) 
[   99.720818][   T15]
[   99.720819][   T15] -> #2 (&tty->write_wait){-.-.}-{2:2}:
[ 99.720839][ T15] __lock_acquire (lockdep.c:?) 
[ 99.720841][ T15] lock_acquire (??:?) 
[ 99.720843][ T15] _raw_spin_lock_irqsave (??:?) 
[ 99.720844][ T15] __wake_up_common_lock (wait.c:?) 
[ 99.720846][ T15] tty_port_default_wakeup (tty_port.c:?) 
[ 99.720848][ T15] serial8250_tx_chars (??:?) 
[ 99.720850][ T15] serial8250_handle_irq (??:?) 
[ 99.720851][ T15] serial8250_interrupt (8250_core.c:?) 
[ 99.720853][ T15] __handle_irq_event_percpu (??:?) 
[ 99.720855][ T15] handle_irq_event_percpu (??:?) 
[ 99.720856][ T15] handle_irq_event (??:?) 
[ 99.720858][ T15] handle_edge_irq (??:?) 
[ 99.720860][ T15] asm_call_irq_on_stack (??:?) 
[ 99.720861][ T15] common_interrupt (??:?) 
[ 99.720863][ T15] asm_common_interrupt (??:?) 
[ 99.720865][ T15] default_idle (??:?) 
[ 99.720866][ T15] default_idle_call (??:?) 
[ 99.720868][ T15] cpuidle_idle_call (idle.c:?) 
[ 99.720870][ T15] do_idle (idle.c:?) 
[ 99.720872][ T15] cpu_startup_entry (??:?) 
[ 99.720874][ T15] start_secondary (smpboot.c:?) 
[ 99.720875][ T15] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:277) 
[   99.720877][   T15]
[   99.720878][   T15] -> #1 (&port_lock_key){-.-.}-{2:2}:
[ 99.720884][ T15] __lock_acquire (lockdep.c:?) 
[ 99.720886][ T15] lock_acquire (??:?) 
[ 99.720887][ T15] _raw_spin_lock_irqsave (??:?) 
[ 99.720889][ T15] serial8250_console_write (??:?) 
[ 99.720891][ T15] call_console_drivers+0x237/0x400 
[ 99.720893][ T15] console_unlock (??:?) 
[ 99.720895][ T15] vprintk_emit (??:?) 
[ 99.720897][ T15] printk (??:?) 
[ 99.720898][ T15] register_console (??:?) 
[ 99.720900][ T15] univ8250_console_init (8250_core.c:?) 
[ 99.720902][ T15] console_init (??:?) 
[ 99.720903][ T15] start_kernel (??:?) 
[ 99.720905][ T15] secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:277) 
[   99.720906][   T15]
[   99.720907][   T15] -> #0 (console_owner){-.-.}-{0:0}:
[ 99.720914][ T15] check_prev_add (lockdep.c:?) 
[ 99.720915][ T15] validate_chain (lockdep.c:?) 
[ 99.720917][ T15] __lock_acquire (lockdep.c:?) 
[ 99.720919][ T15] lock_acquire (??:?) 
[ 99.720920][ T15] console_unlock (??:?) 
[ 99.720922][ T15] vprintk_emit (??:?) 
[ 99.720923][ T15] printk (??:?) 
[ 99.720925][ T15] report_bug.cold (bug.c:?) 
[ 99.720927][ T15] handle_bug (traps.c:?) 
[ 99.720929][ T15] exc_invalid_op (??:?) 
[ 99.720930][ T15] asm_exc_invalid_op (??:?) 
[ 99.720932][ T15] sched_cpu_dying (??:?) 
[ 99.720934][ T15] cpuhp_invoke_callback (cpu.c:?) 
[ 99.720935][ T15] take_cpu_down (cpu.c:?) 
[ 99.720937][ T15] multi_cpu_stop (stop_machine.c:?) 
[ 99.720939][ T15] cpu_stopper_thread (stop_machine.c:?) 
[ 99.720941][ T15] smpboot_thread_fn (smpboot.c:?) 
[ 99.720942][ T15] kthread (kthread.c:?) 
[ 99.720944][ T15] ret_from_fork (??:?) 
[   99.720944][   T15]
[   99.720946][   T15] other info that might help us debug this:
[   99.720946][   T15]
[   99.720948][   T15] Chain exists of:
[   99.720949][   T15]   console_owner --> &p->pi_lock --> &rq->lock
[   99.720955][   T15]
[   99.720956][   T15]  Possible unsafe locking scenario:
[   99.720957][   T15]
[   99.720958][   T15]        CPU0                    CPU1
[   99.720960][   T15]        ----                    ----
[   99.720961][   T15]   lock(&rq->lock);
[   99.720966][   T15]                                lock(&p->pi_lock);
[   99.720970][   T15]                                lock(&rq->lock);
[   99.720974][   T15]   lock(console_owner);
[   99.720978][   T15]
[   99.720980][   T15]  *** DEADLOCK ***
[   99.720981][   T15]
[   99.720983][   T15] 2 locks held by migration/1/15:
[ 99.720984][ T15] #0: ffff8883a07f4f18 (&rq->lock){-.-.}-{2:2}, at: sched_cpu_dying (??:?) 
[ 99.720992][ T15] #1: ffffffff84100560 (console_lock){+.+.}-{0:0}, at: vprintk_emit (??:?) 
[   99.721001][   T15]
[   99.721002][   T15] stack backtrace:
[   99.721005][   T15] CPU: 1 PID: 15 Comm: migration/1 Not tainted 5.10.0-rc1-00006-g3dc80c278022 #1
[   99.721007][   T15] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
[ 99.721009][ T15] Stopper: multi_cpu_stop+0x0/0x360 <- 0x0 
[   99.721011][   T15] Call Trace:
[ 99.721012][ T15] dump_stack (??:?) 
[ 99.721014][ T15] check_noncircular (lockdep.c:?) 
[ 99.721016][ T15] ? print_circular_bug (lockdep.c:?) 
[ 99.721018][ T15] ? add_lock_to_list+0x193/0x370 
[ 99.721019][ T15] check_prev_add (lockdep.c:?) 
[ 99.721021][ T15] validate_chain (lockdep.c:?) 
[ 99.721022][ T15] ? check_prev_add (lockdep.c:?) 
[ 99.721024][ T15] ? sched_clock (??:?) 
[ 99.721026][ T15] __lock_acquire (lockdep.c:?) 
[ 99.721027][ T15] ? sched_clock (??:?) 
[ 99.721029][ T15] ? sched_clock_cpu (??:?) 
[ 99.721031][ T15] lock_acquire (??:?) 
[ 99.721032][ T15] ? console_unlock (??:?) 
[ 99.721034][ T15] ? rcu_read_unlock (workqueue.c:?) 


If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <yujie.liu@...el.com>
| Link: https://lore.kernel.org/oe-lkp/202212150942.84e60db1-yujie.liu@intel.com


To reproduce:

        # build kernel
	cd linux
	cp config-5.10.0-rc1-00006-g3dc80c278022 .config
	make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
	make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
	cd <mod-install-dir>
	find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz


        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email

        # if come across any failure that blocks the test,
        # please remove ~/.lkp and /lkp dir to run from a clean state.


-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

View attachment "config-5.10.0-rc1-00006-g3dc80c278022" of type "text/plain" (139884 bytes)

View attachment "job-script" of type "text/plain" (5646 bytes)

Download attachment "dmesg.xz" of type "application/x-xz" (30692 bytes)

View attachment "rcutorture" of type "text/plain" (120386 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ