lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 25 May 2022 15:14:33 +0800
From:   kernel test robot <oliver.sang@...el.com>
To:     Muchun Song <songmuchun@...edance.com>
Cc:     0day robot <lkp@...el.com>, LKML <linux-kernel@...r.kernel.org>,
        linux-mm@...ck.org, cgroups@...r.kernel.org, lkp@...ts.01.org,
        hannes@...xchg.org, mhocko@...nel.org, roman.gushchin@...ux.dev,
        shakeelb@...gle.com, duanxiongchun@...edance.com,
        longman@...hat.com, Muchun Song <songmuchun@...edance.com>
Subject: [mm]  bec0ae1210: WARNING:possible_recursive_locking_detected



Greeting,

FYI, we noticed the following commit (built with gcc-11):

commit: bec0ae12106e0cf12dd4e0e21eb0754b99be0ba2 ("[PATCH v4 09/11] mm: memcontrol: use obj_cgroup APIs to charge the LRU pages")
url: https://github.com/intel-lab-lkp/linux/commits/Muchun-Song/Use-obj_cgroup-APIs-to-charge-the-LRU-pages/20220524-143056
patch link: https://lore.kernel.org/linux-mm/20220524060551.80037-10-songmuchun@bytedance.com

in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <oliver.sang@...el.com>


[   41.024908][  T135] WARNING: possible recursive locking detected
[   41.025923][  T135] 5.18.0-00009-gbec0ae12106e #1 Not tainted
[   41.026805][  T135] --------------------------------------------
[   41.027780][  T135] kworker/1:2/135 is trying to acquire lock:
[ 41.028743][ T135] ffff88815b545068 (&lruvec->lru_lock){....}-{2:2}, at: lruvec_reparent_lock (include/linux/nodemask.h:271 mm/memcontrol.c:376) 
[   41.030324][  T135]
[   41.030324][  T135] but task is already holding lock:
[ 41.031629][ T135] ffff8881a1c43068 (&lruvec->lru_lock){....}-{2:2}, at: lruvec_reparent_lock (mm/memcontrol.c:378) 
[   41.033231][  T135]
[   41.033231][  T135] other info that might help us debug this:
[   41.034551][  T135]  Possible unsafe locking scenario:
[   41.034551][  T135]
[   41.035818][  T135]        CPU0
[   41.036409][  T135]        ----
[   41.037045][  T135]   lock(&lruvec->lru_lock);
[   41.037866][  T135]   lock(&lruvec->lru_lock);
[   41.039123][  T135]
[   41.039123][  T135]  *** DEADLOCK ***
[   41.039123][  T135]
[   41.040984][  T135]  May be due to missing lock nesting notation
[   41.040984][  T135]
[   41.042567][  T135] 5 locks held by kworker/1:2/135:
[ 41.043472][ T135] #0: ffff88839d54b538 ((wq_completion)cgroup_destroy){+.+.}-{0:0}, at: process_one_work (arch/x86/include/asm/atomic64_64.h:34 include/linux/atomic/atomic-long.h:41 include/linux/atomic/atomic-instrumented.h:1280 kernel/workqueue.c:636 kernel/workqueue.c:663 kernel/workqueue.c:2260) 
[ 41.045556][ T135] #1: ffffc90000e9fdb8 ((work_completion)(&css->destroy_work)){+.+.}-{0:0}, at: process_one_work (kernel/workqueue.c:2264) 
[ 41.047649][ T135] #2: ffffffffa46931c8 (cgroup_mutex){+.+.}-{3:3}, at: css_killed_work_fn (kernel/cgroup/cgroup.c:5271 kernel/cgroup/cgroup.c:5554) 
[ 41.049171][ T135] #3: ffffffffa47fe2d8 (objcg_lock){....}-{2:2}, at: mem_cgroup_css_offline (mm/memcontrol.c:453 mm/memcontrol.c:463 mm/memcontrol.c:5382) 
[ 41.050617][ T135] #4: ffff8881a1c43068 (&lruvec->lru_lock){....}-{2:2}, at: lruvec_reparent_lock (mm/memcontrol.c:378) 
[   41.052031][  T135]
[   41.052031][  T135] stack backtrace:
[   41.052926][  T135] CPU: 1 PID: 135 Comm: kworker/1:2 Not tainted 5.18.0-00009-gbec0ae12106e #1
[   41.054190][  T135] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
[   41.055742][  T135] Workqueue: cgroup_destroy css_killed_work_fn
[   41.056645][  T135] Call Trace:
[   41.057138][  T135]  <TASK>
[ 41.057628][ T135] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 4)) 
[ 41.058392][ T135] validate_chain.cold (kernel/locking/lockdep.c:2958 kernel/locking/lockdep.c:3001 kernel/locking/lockdep.c:3790) 
[ 41.059117][ T135] ? check_prev_add (kernel/locking/lockdep.c:3759) 
[ 41.059888][ T135] __lock_acquire (kernel/locking/lockdep.c:5029) 
[ 41.060579][ T135] lock_acquire (kernel/locking/lockdep.c:436 kernel/locking/lockdep.c:5643 kernel/locking/lockdep.c:5606) 
[ 41.061280][ T135] ? lruvec_reparent_lock (include/linux/nodemask.h:271 mm/memcontrol.c:376) 
[ 41.062081][ T135] ? rcu_read_unlock (include/linux/rcupdate.h:723 (discriminator 5)) 
[ 41.062915][ T135] ? lock_acquire (kernel/locking/lockdep.c:436 kernel/locking/lockdep.c:5643 kernel/locking/lockdep.c:5606) 
[ 41.063653][ T135] ? mem_cgroup_css_offline (mm/memcontrol.c:453 mm/memcontrol.c:463 mm/memcontrol.c:5382) 
[ 41.064504][ T135] ? do_raw_spin_lock (arch/x86/include/asm/atomic.h:202 include/linux/atomic/atomic-instrumented.h:543 include/asm-generic/qspinlock.h:82 kernel/locking/spinlock_debug.c:115) 
[ 41.065190][ T135] ? rwlock_bug+0xc0/0xc0 
[ 41.065923][ T135] _raw_spin_lock (include/linux/spinlock_api_smp.h:134 kernel/locking/spinlock.c:154) 
[ 41.066676][ T135] ? lruvec_reparent_lock (include/linux/nodemask.h:271 mm/memcontrol.c:376) 
[ 41.067455][ T135] lruvec_reparent_lock (include/linux/nodemask.h:271 mm/memcontrol.c:376) 
[ 41.068227][ T135] mem_cgroup_css_offline (mm/memcontrol.c:453 mm/memcontrol.c:463 mm/memcontrol.c:5382) 
[ 41.069103][ T135] ? lock_is_held_type (kernel/locking/lockdep.c:5382 kernel/locking/lockdep.c:5684) 
[ 41.069858][ T135] css_killed_work_fn (kernel/cgroup/cgroup.c:5279 kernel/cgroup/cgroup.c:5554) 
[ 41.070637][ T135] process_one_work (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 include/trace/events/workqueue.h:108 kernel/workqueue.c:2294) 
[ 41.071459][ T135] ? rcu_read_unlock (include/linux/rcupdate.h:723 (discriminator 5)) 
[ 41.072308][ T135] ? pwq_dec_nr_in_flight (kernel/workqueue.c:2184) 
[ 41.073231][ T135] ? rwlock_bug+0xc0/0xc0 
[ 41.073922][ T135] worker_thread (include/linux/list.h:292 kernel/workqueue.c:2437) 
[ 41.074572][ T135] ? __kthread_parkme (arch/x86/include/asm/bitops.h:207 (discriminator 4) include/asm-generic/bitops/instrumented-non-atomic.h:135 (discriminator 4) kernel/kthread.c:270 (discriminator 4)) 
[ 41.075220][ T135] ? schedule (arch/x86/include/asm/bitops.h:207 (discriminator 1) include/asm-generic/bitops/instrumented-non-atomic.h:135 (discriminator 1) include/linux/thread_info.h:118 (discriminator 1) include/linux/sched.h:2154 (discriminator 1) kernel/sched/core.c:6462 (discriminator 1)) 
[ 41.075942][ T135] ? process_one_work (kernel/workqueue.c:2379) 
[ 41.076755][ T135] ? process_one_work (kernel/workqueue.c:2379) 
[ 41.077600][ T135] kthread (kernel/kthread.c:376) 
[ 41.078174][ T135] ? kthread_complete_and_exit (kernel/kthread.c:331) 
[ 41.078951][ T135] ret_from_fork (arch/x86/entry/entry_64.S:304) 
[   41.079668][  T135]  </TASK>
[  OK  ] Started Load Kernel Modules.
[  OK  ] Mounted RPC Pipe File System.
[  OK  ] Started Remount Root and Kernel File Systems.
[  OK  ] Mounted Kernel Debug File System.
[  OK  ] Mounted Huge Pages File System.
Starting Load/Save Random Seed...
Starting Create System Users...
Starting Apply Kernel Variables...
Mounting Kernel Configuration File System...
[  OK  ] Started Load/Save Random Seed.
[  OK  ] Started Create System Users.
[  OK  ] Started Apply Kernel Variables.
[  OK  ] Mounted Kernel Configuration File System.
Starting Create Static Device Nodes in /dev...
[  OK  ] Started Create Static Device Nodes in /dev.
[  OK  ] Reached target Local File Systems (Pre).
[  OK  ] Reached target Local File Systems.
Starting Preprocess NFS configuration...
Starting udev Kernel Device Manager...
[  OK  ] Started Journal Service.
[  OK  ] Started Preprocess NFS configuration.
[  OK  ] Reached target NFS client services.
Starting Flush Journal to Persistent Storage...
[  OK  ] Started udev Kernel Device Manager.
[  OK  ] Started Flush Journal to Persistent Storage.
Starting Create Volatile Files and Directories...
[  OK  ] Started Create Volatile Files and Directories.
Starting Network Time Synchronization...
Starting RPC bind portmap service...
Starting Update UTMP about System Boot/Shutdown...
[  OK  ] Started RPC bind portmap service.
[  OK  ] Reached target RPC Port Mapper.
[  OK  ] Reached target Remote File Systems (Pre).
[  OK  ] Reached target Remote File Systems.
[  OK  ] Started Update UTMP about System Boot/Shutdown.
[  OK  ] Started Network Time Synchronization.
[  OK  ] Reached target System Time Synchronized.


To reproduce:

        # build kernel
	cd linux
	cp config-5.18.0-00009-gbec0ae12106e .config
	make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
	make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
	cd <mod-install-dir>
	find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz


        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email

        # if come across any failure that blocks the test,
        # please remove ~/.lkp and /lkp dir to run from a clean state.



-- 
0-DAY CI Kernel Test Service
https://01.org/lkp



View attachment "config-5.18.0-00009-gbec0ae12106e" of type "text/plain" (167146 bytes)

View attachment "job-script" of type "text/plain" (4950 bytes)

Download attachment "dmesg.xz" of type "application/x-xz" (15336 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ