lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <3301352.1653645472@warthog.procyon.org.uk>
Date:   Fri, 27 May 2022 10:57:52 +0100
From:   David Howells <dhowells@...hat.com>
To:     Zhu Yanjun <zyjzyj2000@...il.com>,
        Bob Pearson <rpearsonhpe@...il.com>,
        Steve French <smfrench@...il.com>
cc:     dhowells@...hat.com, willy@...radead.org,
        Tom Talpey <tom@...pey.com>,
        Namjae Jeon <linkinjeon@...nel.org>,
        linux-rdma@...r.kernel.org, linux-cifs@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Lockdep splat in RXE (softRoCE) driver in xarray accesses

Hi Zhu, Bob, Steve,

There seems to be a locking bug in the softRoCE driver when mounting a cifs
share.  See attached trace.  I'm guessing the problem is that a softirq
handler is accessing the xarray, but other accesses to the xarray aren't
guarded by _bh or _irq markers on the lock primitives.

I wonder if rxe_pool_get_index() should just rely on the RCU read lock and not
take the spinlock.

Alternatively, __rxe_add_to_pool() should be using xa_alloc_cyclic_bh() or
xa_alloc_cyclic_irq().

I used the following commands:

   rdma link add rxe0 type rxe netdev enp6s0 # andromeda, softRoCE
   mount //192.168.6.1/scratch /xfstest.scratch -o user=shares,rdma,pass=...

talking to ksmbd on the other side.

Kernel is v5.18-rc6.

David
---
infiniband rxe0: set active
infiniband rxe0: added enp6s0
RDS/IB: rxe0: added
CIFS: No dialect specified on mount. Default has changed to a more secure dialect, SMB2.1 or later (e.g. SMB3.1.1), from CIFS (SMB1). To use the less secure SMB1 dialect to access old servers which do not support SMB3.1.1 (or even SMB3 or SMB2.1) specify vers=1.0 on mount.
CIFS: Attempting to mount \\192.168.6.1\scratch

================================
WARNING: inconsistent lock state
5.18.0-rc6-build2+ #465 Not tainted
--------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
ksoftirqd/1/20 [HC0[0]:SC1[1]:HE0:SE0] takes:
ffff888134d11310 (&xa->xa_lock#12){+.?.}-{2:2}, at: rxe_pool_get_index+0x19/0x69
{SOFTIRQ-ON-W} state was registered at:
  mark_usage+0x169/0x17b
  __lock_acquire+0x50c/0x96a
  lock_acquire+0x2f4/0x37b
  _raw_spin_lock+0x2f/0x39
  xa_alloc_cyclic.constprop.0+0x20/0x55
  __rxe_add_to_pool+0xe3/0xf2
  __ib_alloc_pd+0xa2/0x26b
  ib_mad_port_open+0x1ac/0x4a1
  ib_mad_init_device+0x9b/0x1b9
  add_client_context+0x133/0x1b3
  enable_device_and_get+0x129/0x248
  ib_register_device+0x256/0x2fd
  rxe_register_device+0x18e/0x1b7
  rxe_net_add+0x57/0x71
  rxe_newlink+0x71/0x8e
  nldev_newlink+0x200/0x26a
  rdma_nl_rcv_msg+0x260/0x2ab
  rdma_nl_rcv+0x108/0x1a7
  netlink_unicast+0x1fc/0x2b3
  netlink_sendmsg+0x4ce/0x51b
  sock_sendmsg_nosec+0x41/0x4f
  __sys_sendto+0x157/0x1cc
  __x64_sys_sendto+0x76/0x82
  do_syscall_64+0x39/0x46
  entry_SYSCALL_64_after_hwframe+0x44/0xae
irq event stamp: 194111
hardirqs last  enabled at (194110): [<ffffffff81094eb2>] __local_bh_enable_ip+0xb8/0xcc
hardirqs last disabled at (194111): [<ffffffff82040077>] _raw_spin_lock_irqsave+0x1b/0x51
softirqs last  enabled at (194100): [<ffffffff8240043a>] __do_softirq+0x43a/0x489
softirqs last disabled at (194105): [<ffffffff81094d30>] run_ksoftirqd+0x31/0x56

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&xa->xa_lock#12);
  <Interrupt>
    lock(&xa->xa_lock#12);

 *** DEADLOCK ***

no locks held by ksoftirqd/1/20.

stack backtrace:
CPU: 1 PID: 20 Comm: ksoftirqd/1 Not tainted 5.18.0-rc6-build2+ #465
Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x45/0x59
 valid_state+0x56/0x61
 mark_lock_irq+0x9b/0x2ec
 ? ret_from_fork+0x1f/0x30
 ? valid_state+0x61/0x61
 ? stack_trace_save+0x8f/0xbe
 ? filter_irq_stacks+0x58/0x58
 ? jhash.constprop.0+0x1ad/0x202
 ? save_trace+0x17c/0x196
 mark_lock.part.0+0x10c/0x164
 mark_usage+0xe6/0x17b
 __lock_acquire+0x50c/0x96a
 lock_acquire+0x2f4/0x37b
 ? rxe_pool_get_index+0x19/0x69
 ? rcu_read_unlock+0x52/0x52
 ? jhash.constprop.0+0x1ad/0x202
 ? lockdep_unlock+0xde/0xe6
 ? validate_chain+0x44a/0x4a8
 ? req_next_wqe+0x312/0x363
 _raw_spin_lock_irqsave+0x41/0x51
 ? rxe_pool_get_index+0x19/0x69
 rxe_pool_get_index+0x19/0x69
 rxe_get_av+0xbe/0x14b
 rxe_requester+0x6b5/0xbb0
 ? rnr_nak_timer+0x16/0x16
 ? lock_downgrade+0xad/0xad
 ? rcu_read_lock_bh_held+0xab/0xab
 ? __wake_up+0xf/0xf
 ? mark_held_locks+0x1f/0x78
 ? __local_bh_enable_ip+0xb8/0xcc
 ? rnr_nak_timer+0x16/0x16
 rxe_do_task+0xb5/0x13d
 ? rxe_detach_mcast+0x1d6/0x1d6
 tasklet_action_common.constprop.0+0xda/0x145
 __do_softirq+0x202/0x489
 ? __irq_exit_rcu+0x108/0x108
 ? _local_bh_enable+0x1c/0x1c
 run_ksoftirqd+0x31/0x56
 smpboot_thread_fn+0x35c/0x376
 ? sort_range+0x1c/0x1c
 kthread+0x164/0x173
 ? kthread_complete_and_exit+0x20/0x20
 ret_from_fork+0x1f/0x30
 </TASK>
CIFS: VFS: RDMA transport established

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ