linux-kernel - [lib/cpumask] e5ad41dae2: BUG:workqueue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <202210072241.2b3cc734-oliver.sang@intel.com>
Date:   Fri, 7 Oct 2022 22:42:57 +0800
From:   kernel test robot <oliver.sang@...el.com>
To:     Valentin Schneider <vschneid@...hat.com>
CC:     <lkp@...ts.01.org>, <lkp@...el.com>,
        <linux-kernel@...r.kernel.org>, <linux-block@...r.kernel.org>,
        Jens Axboe <axboe@...nel.dk>,
        Yury Norov <yury.norov@...il.com>,
        Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
        Rasmus Villemoes <linux@...musvillemoes.dk>
Subject: [lib/cpumask]  e5ad41dae2: BUG:workqueue_lockup-pool


Greeting,

FYI, we noticed the following commit (built with gcc-11):

commit: e5ad41dae251946ecdcdc38bb8f639cd55a8eae1 ("[RFC PATCH bitmap-for-next 2/4] lib/cpumask: Fix cpumask_check() warning in cpumask_next_wrap*()")
url: https://github.com/intel-lab-lkp/linux/commits/Valentin-Schneider/lib-cpumask-blk_mq-Fix-blk_mq_hctx_next_cpu-vs-cpumask_check/20221006-202402
base: https://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git for-next
patch link: https://lore.kernel.org/linux-block/20221006122112.663119-3-vschneid@redhat.com

in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


+----------------------------------------------+------------+------------+
|                                              | d8e0ef5a1d | e5ad41dae2 |
+----------------------------------------------+------------+------------+
| boot_successes                               | 10         | 0          |
| boot_failures                                | 0          | 10         |
| BUG:workqueue_lockup-pool                    | 0          | 10         |
| INFO:rcu_sched_detected_stalls_on_CPUs/tasks | 0          | 10         |
| BUG:kernel_hang_in_boot_stage                | 0          | 10         |
+----------------------------------------------+------------+------------+


If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <oliver.sang@...el.com>
| Link: https://lore.kernel.org/r/202210072241.2b3cc734-oliver.sang@intel.com


[   60.568059][    C0] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 58s!
[   60.569057][    C0] Showing busy workqueues and worker pools:
[   60.569663][    C0] workqueue events: flags=0x0
[   60.570057][    C0]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[   60.570064][    C0]     pending: vmstat_shepherd
[   90.776058][    C0] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 88s!
[   90.777057][    C0] Showing busy workqueues and worker pools:
[   90.777819][    C0] workqueue events: flags=0x0
[   90.778056][    C0]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
[   90.778065][    C0]     pending: vmstat_shepherd
[  105.234045][    C0] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[  105.234045][    C0]  (detected by 0, t=105002 jiffies, g=-1195, q=1 ncpus=2)
[  105.234045][    C0] rcu: All QSes seen, last rcu_sched kthread activity 105002 (-194950--299952), jiffies_till_next_fqs=3, root ->qsmask 0x0
[  105.234045][    C0] rcu: rcu_sched kthread starved for 105002 jiffies! g-1195 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0
[  105.234045][    C0] rcu:     Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[  105.234045][    C0] rcu: RCU grace-period kthread stack dump:
[  105.234045][    C0] task:rcu_sched       state:R  running task     stack: 7484 pid:   11 ppid:     2 flags:0x00004000
[  105.234045][    C0] Call Trace:
[  105.234045][    C0]  ? __schedule+0x58a/0x5b8
[  105.234045][    C0]  ? schedule+0x83/0xba
[  105.234045][    C0]  ? schedule_timeout+0x88/0xa5
[  105.234045][    C0]  ? del_timer_sync+0x7d/0x7d
[  105.234045][    C0]  ? rcu_gp_fqs_loop+0xef/0x294
[  105.234045][    C0]  ? rcu_gp_kthread+0xd4/0xf0
[  105.234045][    C0]  ? kthread+0xc0/0xc5
[  105.234045][    C0]  ? rcu_gp_init+0x4c4/0x4c4
[  105.234045][    C0]  ? kthread_complete_and_exit+0x1b/0x1b
[  105.234045][    C0]  ? ret_from_fork+0x19/0x24
[  105.234045][    C0] rcu: Stack dump where RCU GP kthread last ran:
[  105.234045][    C0] NMI backtrace for cpu 0
[  105.234045][    C0] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.0.0-rc7-00395-ge5ad41dae251 #1
[  105.234045][    C0] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
[  105.234045][    C0] Call Trace:
[  105.234045][    C0]  ? dump_stack_lvl+0x42/0x54
[  105.234045][    C0]  ? dump_stack+0xd/0x10
[  105.234045][    C0]  ? nmi_cpu_backtrace+0x96/0xb8
[  105.234045][    C0]  ? lapic_can_unplug_cpu+0x87/0x87
[  105.234045][    C0]  ? nmi_trigger_cpumask_backtrace+0x49/0xac
[  105.234045][    C0]  ? arch_trigger_cpumask_backtrace+0x15/0x17
[  105.234045][    C0]  ? rcu_check_gp_kthread_starvation+0x122/0x131
[  105.234045][    C0]  ? print_other_cpu_stall+0x264/0x2a9
[  105.234045][    C0]  ? print_other_cpu_stall+0x297/0x2a9
[  105.234045][    C0]  ? check_cpu_stall+0x174/0x1bd
[  105.234045][    C0]  ? rcu_sched_clock_irq+0xd7/0x186
[  105.234045][    C0]  ? update_process_times+0x45/0x60
[  105.234045][    C0]  ? tick_periodic+0xc0/0xcc
[  105.234045][    C0]  ? tick_handle_periodic+0x22/0x66
[  105.234045][    C0]  ? sysvec_call_function_single+0x2c/0x2c
[  105.234045][    C0]  ? __sysvec_apic_timer_interrupt+0xe4/0x182
[  105.234045][    C0]  ? sysvec_apic_timer_interrupt+0x1b/0x2e
[  105.234045][    C0]  ? handle_exception+0x133/0x133
[  105.234045][    C0]  ? rmi_firmware_update+0x3ab/0x3f7
[  105.234045][    C0]  ? sysvec_call_function_single+0x2c/0x2c
[  105.234045][    C0]  ? build_sched_domains+0x1e5/0x71c
[  105.234045][    C0]  ? sysvec_call_function_single+0x2c/0x2c
[  105.234045][    C0]  ? build_sched_domains+0x1e5/0x71c
[  105.234045][    C0]  ? sched_init_domains+0x73/0x77
[  105.234045][    C0]  ? sched_init_smp+0x26/0x6c
[  105.234045][    C0]  ? kernel_init_freeable+0x143/0x195
[  105.234045][    C0]  ? rest_init+0x13a/0x13a
[  105.234045][    C0]  ? kernel_init+0x17/0xf3
[  105.234045][    C0]  ? ret_from_fork+0x19/0x24



To reproduce:

        # build kernel
	cd linux
	cp config-6.0.0-rc7-00395-ge5ad41dae251 .config
	make HOSTCC=gcc-11 CC=gcc-11 ARCH=i386 olddefconfig prepare modules_prepare bzImage modules
	make HOSTCC=gcc-11 CC=gcc-11 ARCH=i386 INSTALL_MOD_PATH=<mod-install-dir> modules_install
	cd <mod-install-dir>
	find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz


        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email

        # if come across any failure that blocks the test,
        # please remove ~/.lkp and /lkp dir to run from a clean state.



-- 
0-DAY CI Kernel Test Service
https://01.org/lkp



View attachment "config-6.0.0-rc7-00395-ge5ad41dae251" of type "text/plain" (165111 bytes)

View attachment "job-script" of type "text/plain" (5089 bytes)

Download attachment "dmesg.xz" of type "application/x-xz" (8640 bytes)