lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <202310302207.a25f1a30-oliver.sang@intel.com>
Date:   Mon, 30 Oct 2023 23:40:01 +0800
From:   kernel test robot <oliver.sang@...el.com>
To:     Waiman Long <longman@...hat.com>
CC:     <oe-lkp@...ts.linux.dev>, <lkp@...el.com>,
        <linux-kernel@...r.kernel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        <aubrey.li@...ux.intel.com>, <yu.c.chen@...el.com>,
        <oliver.sang@...el.com>
Subject: [linus:master] [sched]  851a723e45:
 WARNING:possible_circular_locking_dependency_detected



Hello,

kernel test robot noticed "WARNING:possible_circular_locking_dependency_detected" on:

commit: 851a723e45d1c4c8f6f7b0d2cfbc5f53690bb4e9 ("sched: Always clear user_cpus_ptr in do_set_cpus_allowed()")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

[test failed on linus/master 56567a20b22bdbf85c3e55eee3bf2bd23fa2f108]
[test failed on linux-next/master 66f1e1ea3548378ff6387b1ce0b40955d54e86aa]

in testcase: rcutorture
version: 
with following parameters:

	runtime: 300s
	test: cpuhotplug
	torture_type: trivial



compiler: gcc-12
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

(please refer to attached dmesg/kmsg for entire log/backtrace)


please be aware that the issue doesn't always happen, ~50% rate:

da019032819a1f09 851a723e45d1c4c8f6f7b0d2cfb
---------------- ---------------------------
       fail:runs  %reproduction    fail:runs
           |             |             |
           :44          45%          20:44    dmesg.WARNING:possible_circular_locking_dependency_detected
           :44          45%          20:44    dmesg.WARNING:possible_circular_locking_dependency_detected_migration_is_trying_to_acquire_lock:at:rcu_nocb_try_bypass_but_task_is_already_holding_lock:at:__balance_push_cpu_stop/0x



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@...el.com>
| Closes: https://lore.kernel.org/oe-lkp/202310302207.a25f1a30-oliver.sang@intel.com



[  708.890347][   T22] 
[  708.890648][   T22] ======================================================
[  708.891276][   T22] WARNING: possible circular locking dependency detected
[  708.891882][   T22] 6.1.0-rc2-00015-g851a723e45d1 #1 Tainted: G        W         
[  708.892549][   T22] ------------------------------------------------------
[  708.893234][   T22] migration/1/22 is trying to acquire lock:
[  708.893792][   T22] ffff88839d3f2478 (&rdp->nocb_lock){-.-.}-{2:2}, at: rcu_nocb_try_bypass+0x71b/0x990
[  708.894716][   T22] 
[  708.894716][   T22] but task is already holding lock:
[  708.895414][   T22] ffff88839d3f1518 (&rq->__lock){-.-.}-{2:2}, at: __balance_push_cpu_stop+0xa0/0x470
[  708.896311][   T22] 
[  708.896311][   T22] which lock already depends on the new lock.
[  708.896311][   T22] 
[  708.897274][   T22] 
[  708.897274][   T22] the existing dependency chain (in reverse order) is:
[  708.898122][   T22] 
[  708.898122][   T22] -> #2 (&rq->__lock){-.-.}-{2:2}:
[  708.898816][   T22]        __lock_acquire+0x9d7/0x1330
[  708.899332][   T22]        lock_acquire+0x193/0x500
[  708.899802][   T22]        _raw_spin_lock_nested+0x2e/0x70
[  708.900317][   T22]        resched_cpu+0x77/0x150
[  708.900802][   T22]        rcu_implicit_dynticks_qs+0x5bc/0x850
[  708.901391][   T22]        force_qs_rnp+0x40b/0x790
[  708.901873][   T22]        rcu_gp_fqs_loop+0x63a/0xa40
[  708.902393][   T22]        rcu_gp_kthread+0x23b/0x380
[  708.902886][   T22]        kthread+0x246/0x2d0
[  708.903320][   T22]        ret_from_fork+0x1f/0x30
[  708.903777][   T22] 
[  708.903777][   T22] -> #1 (rcu_node_0){-.-.}-{2:2}:
[  708.904456][   T22]        __lock_acquire+0x9d7/0x1330
[  708.904971][   T22]        lock_acquire+0x193/0x500
[  708.905470][   T22]        _raw_spin_lock+0x30/0x70
[  708.905961][   T22]        nocb_gp_wait+0x349/0x1d50
[  708.906460][   T22]        rcu_nocb_gp_kthread+0xa3/0x230
[  708.907000][   T22]        kthread+0x246/0x2d0
[  708.907448][   T22]        ret_from_fork+0x1f/0x30
[  708.907923][   T22] 
[  708.907923][   T22] -> #0 (&rdp->nocb_lock){-.-.}-{2:2}:
[  708.908664][   T22]        check_prev_add+0x94/0xbc0
[  708.909167][   T22]        validate_chain+0x701/0x950
[  708.909678][   T22]        __lock_acquire+0x9d7/0x1330
[  708.910197][   T22]        lock_acquire+0x193/0x500
[  708.910682][   T22]        _raw_spin_lock+0x30/0x70
[  708.911174][   T22]        rcu_nocb_try_bypass+0x71b/0x990
[  708.911715][   T22]        call_rcu+0x22d/0x840
[  708.912172][   T22]        __kmem_cache_free+0x1c9/0x460
[  708.912691][   T22]        select_fallback_rq+0x546/0x690
[  708.913226][   T22]        __balance_push_cpu_stop+0x21d/0x470
[  708.913791][   T22]        cpu_stopper_thread+0x206/0x450
[  708.914328][   T22]        smpboot_thread_fn+0x38a/0x870
[  708.914852][   T22]        kthread+0x246/0x2d0
[  708.915297][   T22]        ret_from_fork+0x1f/0x30
[  708.915769][   T22] 
[  708.915769][   T22] other info that might help us debug this:
[  708.915769][   T22] 
[  708.916718][   T22] Chain exists of:
[  708.916718][   T22]   &rdp->nocb_lock --> rcu_node_0 --> &rq->__lock
[  708.916718][   T22] 
[  708.917838][   T22]  Possible unsafe locking scenario:
[  708.917838][   T22] 
[  708.918541][   T22]        CPU0                    CPU1
[  708.919055][   T22]        ----                    ----
[  708.919543][   T22]   lock(&rq->__lock);
[  708.919927][   T22]                                lock(rcu_node_0);
[  708.920535][   T22]                                lock(&rq->__lock);
[  708.921146][   T22]   lock(&rdp->nocb_lock);
[  708.921576][   T22] 
[  708.921576][   T22]  *** DEADLOCK ***
[  708.921576][   T22] 
[  708.922329][   T22] 2 locks held by migration/1/22:
[  708.922796][   T22]  #0: ffff88812e1ec000 (&p->pi_lock){-.-.}-{2:2}, at: __balance_push_cpu_stop+0x8b/0x470
[  708.923764][   T22]  #1: ffff88839d3f1518 (&rq->__lock){-.-.}-{2:2}, at: __balance_push_cpu_stop+0xa0/0x470
[  708.924711][   T22] 
[  708.924711][   T22] stack backtrace:
[  708.925280][   T22] CPU: 1 PID: 22 Comm: migration/1 Tainted: G        W          6.1.0-rc2-00015-g851a723e45d1 #1
[  708.926248][   T22] Stopper: __balance_push_cpu_stop+0x0/0x470 <- balance_push+0x218/0x360
[  708.927040][   T22] Call Trace:
[  708.927351][   T22]  <TASK>
[  708.927632][   T22]  dump_stack_lvl+0x9c/0xea
[  708.928053][   T22]  check_noncircular+0x28e/0x330
[  708.928517][   T22]  ? kvm_sched_clock_read+0x14/0x30
[  708.929024][   T22]  ? print_circular_bug+0x470/0x470
[  708.929546][   T22]  ? housekeeping_affine+0xa0/0xa0
[  708.930071][   T22]  ? alloc_chain_hlocks+0x21f/0x6f0
[  708.930572][   T22]  check_prev_add+0x94/0xbc0
[  708.931053][   T22]  ? lockdep_lock+0xa8/0x190
[  708.931491][   T22]  ? usage_skip+0x170/0x170
[  708.931936][   T22]  validate_chain+0x701/0x950
[  708.932378][   T22]  ? check_prev_add+0xbc0/0xbc0
[  708.932842][   T22]  ? mark_lock+0x36/0x3c0
[  708.933282][   T22]  __lock_acquire+0x9d7/0x1330
[  708.933749][   T22]  lock_acquire+0x193/0x500
[  708.934198][   T22]  ? rcu_nocb_try_bypass+0x71b/0x990
[  708.934717][   T22]  ? lock_downgrade+0x120/0x120
[  708.935188][   T22]  _raw_spin_lock+0x30/0x70
[  708.935631][   T22]  ? rcu_nocb_try_bypass+0x71b/0x990
[  708.936143][   T22]  rcu_nocb_try_bypass+0x71b/0x990
[  708.936641][   T22]  ? rcu_implicit_dynticks_qs+0x850/0x850
[  708.937195][   T22]  ? find_and_remove_object+0x1b0/0x1b0
[  708.937700][   T22]  call_rcu+0x22d/0x840
[  708.938095][   T22]  ? rcu_nocb_try_bypass+0x990/0x990
[  708.938591][   T22]  ? select_fallback_rq+0x546/0x690
[  708.939097][   T22]  __kmem_cache_free+0x1c9/0x460
[  708.939580][   T22]  select_fallback_rq+0x546/0x690
[  708.940067][   T22]  ? __do_set_cpus_allowed+0x730/0x730
[  708.940593][   T22]  ? update_rq_clock+0xd8/0x890
[  708.941060][   T22]  __balance_push_cpu_stop+0x21d/0x470
[  708.941605][   T22]  ? tracer_hardirqs_on+0x3b/0x3f0
[  708.942101][   T22]  ? migration_cpu_stop+0xb60/0xb60
[  708.942582][   T22]  ? _raw_spin_unlock_irq+0x24/0x40
[  708.943068][   T22]  ? migration_cpu_stop+0xb60/0xb60
[  708.943566][   T22]  cpu_stopper_thread+0x206/0x450
[  708.944084][   T22]  ? stop_core_cpuslocked+0x1c0/0x1c0
[  708.944611][   T22]  ? smpboot_thread_fn+0x6f/0x870
[  708.945096][   T22]  smpboot_thread_fn+0x38a/0x870
[  708.945604][   T22]  ? find_next_bit+0x30/0x30
[  708.946058][   T22]  kthread+0x246/0x2d0
[  708.946467][   T22]  ? kthread_complete_and_exit+0x40/0x40
[  708.947015][   T22]  ret_from_fork+0x1f/0x30
[  708.947449][   T22]  </TASK>
[  708.974879][ T3910] smpboot: CPU 1 is now offline
[  709.052971][ T3910] x86: Booting SMP configuration:
[  709.053504][ T3910] smpboot: Booting Node 0 Processor 1 APIC 0x1
[  709.065107][    T0] masked ExtINT on CPU#1
[  709.177297][ T3910] smpboot: CPU 1 is now offline
[  709.236384][ T3910] x86: Booting SMP configuration:
[  709.236914][ T3910] smpboot: Booting Node 0 Processor 1 APIC 0x1
[  709.248562][    T0] masked ExtINT on CPU#1
[  709.357577][ T3910] smpboot: CPU 1 is now offline



The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231030/202310302207.a25f1a30-oliver.sang@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ