[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <3301352.1653645472@warthog.procyon.org.uk>
Date: Fri, 27 May 2022 10:57:52 +0100
From: David Howells <dhowells@...hat.com>
To: Zhu Yanjun <zyjzyj2000@...il.com>,
Bob Pearson <rpearsonhpe@...il.com>,
Steve French <smfrench@...il.com>
cc: dhowells@...hat.com, willy@...radead.org,
Tom Talpey <tom@...pey.com>,
Namjae Jeon <linkinjeon@...nel.org>,
linux-rdma@...r.kernel.org, linux-cifs@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Lockdep splat in RXE (softRoCE) driver in xarray accesses
Hi Zhu, Bob, Steve,
There seems to be a locking bug in the softRoCE driver when mounting a cifs
share. See attached trace. I'm guessing the problem is that a softirq
handler is accessing the xarray, but other accesses to the xarray aren't
guarded by _bh or _irq markers on the lock primitives.
I wonder if rxe_pool_get_index() should just rely on the RCU read lock and not
take the spinlock.
Alternatively, __rxe_add_to_pool() should be using xa_alloc_cyclic_bh() or
xa_alloc_cyclic_irq().
I used the following commands:
rdma link add rxe0 type rxe netdev enp6s0 # andromeda, softRoCE
mount //192.168.6.1/scratch /xfstest.scratch -o user=shares,rdma,pass=...
talking to ksmbd on the other side.
Kernel is v5.18-rc6.
David
---
infiniband rxe0: set active
infiniband rxe0: added enp6s0
RDS/IB: rxe0: added
CIFS: No dialect specified on mount. Default has changed to a more secure dialect, SMB2.1 or later (e.g. SMB3.1.1), from CIFS (SMB1). To use the less secure SMB1 dialect to access old servers which do not support SMB3.1.1 (or even SMB3 or SMB2.1) specify vers=1.0 on mount.
CIFS: Attempting to mount \\192.168.6.1\scratch
================================
WARNING: inconsistent lock state
5.18.0-rc6-build2+ #465 Not tainted
--------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
ksoftirqd/1/20 [HC0[0]:SC1[1]:HE0:SE0] takes:
ffff888134d11310 (&xa->xa_lock#12){+.?.}-{2:2}, at: rxe_pool_get_index+0x19/0x69
{SOFTIRQ-ON-W} state was registered at:
mark_usage+0x169/0x17b
__lock_acquire+0x50c/0x96a
lock_acquire+0x2f4/0x37b
_raw_spin_lock+0x2f/0x39
xa_alloc_cyclic.constprop.0+0x20/0x55
__rxe_add_to_pool+0xe3/0xf2
__ib_alloc_pd+0xa2/0x26b
ib_mad_port_open+0x1ac/0x4a1
ib_mad_init_device+0x9b/0x1b9
add_client_context+0x133/0x1b3
enable_device_and_get+0x129/0x248
ib_register_device+0x256/0x2fd
rxe_register_device+0x18e/0x1b7
rxe_net_add+0x57/0x71
rxe_newlink+0x71/0x8e
nldev_newlink+0x200/0x26a
rdma_nl_rcv_msg+0x260/0x2ab
rdma_nl_rcv+0x108/0x1a7
netlink_unicast+0x1fc/0x2b3
netlink_sendmsg+0x4ce/0x51b
sock_sendmsg_nosec+0x41/0x4f
__sys_sendto+0x157/0x1cc
__x64_sys_sendto+0x76/0x82
do_syscall_64+0x39/0x46
entry_SYSCALL_64_after_hwframe+0x44/0xae
irq event stamp: 194111
hardirqs last enabled at (194110): [<ffffffff81094eb2>] __local_bh_enable_ip+0xb8/0xcc
hardirqs last disabled at (194111): [<ffffffff82040077>] _raw_spin_lock_irqsave+0x1b/0x51
softirqs last enabled at (194100): [<ffffffff8240043a>] __do_softirq+0x43a/0x489
softirqs last disabled at (194105): [<ffffffff81094d30>] run_ksoftirqd+0x31/0x56
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&xa->xa_lock#12);
<Interrupt>
lock(&xa->xa_lock#12);
*** DEADLOCK ***
no locks held by ksoftirqd/1/20.
stack backtrace:
CPU: 1 PID: 20 Comm: ksoftirqd/1 Not tainted 5.18.0-rc6-build2+ #465
Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
Call Trace:
<TASK>
dump_stack_lvl+0x45/0x59
valid_state+0x56/0x61
mark_lock_irq+0x9b/0x2ec
? ret_from_fork+0x1f/0x30
? valid_state+0x61/0x61
? stack_trace_save+0x8f/0xbe
? filter_irq_stacks+0x58/0x58
? jhash.constprop.0+0x1ad/0x202
? save_trace+0x17c/0x196
mark_lock.part.0+0x10c/0x164
mark_usage+0xe6/0x17b
__lock_acquire+0x50c/0x96a
lock_acquire+0x2f4/0x37b
? rxe_pool_get_index+0x19/0x69
? rcu_read_unlock+0x52/0x52
? jhash.constprop.0+0x1ad/0x202
? lockdep_unlock+0xde/0xe6
? validate_chain+0x44a/0x4a8
? req_next_wqe+0x312/0x363
_raw_spin_lock_irqsave+0x41/0x51
? rxe_pool_get_index+0x19/0x69
rxe_pool_get_index+0x19/0x69
rxe_get_av+0xbe/0x14b
rxe_requester+0x6b5/0xbb0
? rnr_nak_timer+0x16/0x16
? lock_downgrade+0xad/0xad
? rcu_read_lock_bh_held+0xab/0xab
? __wake_up+0xf/0xf
? mark_held_locks+0x1f/0x78
? __local_bh_enable_ip+0xb8/0xcc
? rnr_nak_timer+0x16/0x16
rxe_do_task+0xb5/0x13d
? rxe_detach_mcast+0x1d6/0x1d6
tasklet_action_common.constprop.0+0xda/0x145
__do_softirq+0x202/0x489
? __irq_exit_rcu+0x108/0x108
? _local_bh_enable+0x1c/0x1c
run_ksoftirqd+0x31/0x56
smpboot_thread_fn+0x35c/0x376
? sort_range+0x1c/0x1c
kthread+0x164/0x173
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
</TASK>
CIFS: VFS: RDMA transport established
Powered by blists - more mailing lists