lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Sat, 28 May 2022 08:23:45 +0800
From:   Yanjun Zhu <yanjun.zhu@...ux.dev>
To:     David Howells <dhowells@...hat.com>,
        Zhu Yanjun <zyjzyj2000@...il.com>,
        Bob Pearson <rpearsonhpe@...il.com>,
        Steve French <smfrench@...il.com>
Cc:     willy@...radead.org, Tom Talpey <tom@...pey.com>,
        Namjae Jeon <linkinjeon@...nel.org>,
        linux-rdma@...r.kernel.org, linux-cifs@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: Lockdep splat in RXE (softRoCE) driver in xarray accesses

在 2022/5/27 17:57, David Howells 写道:
> Hi Zhu, Bob, Steve,
> 
> There seems to be a locking bug in the softRoCE driver when mounting a cifs
> share.  See attached trace.  I'm guessing the problem is that a softirq
> handler is accessing the xarray, but other accesses to the xarray aren't
> guarded by _bh or _irq markers on the lock primitives.
> 
> I wonder if rxe_pool_get_index() should just rely on the RCU read lock and not
> take the spinlock.
> 
> Alternatively, __rxe_add_to_pool() should be using xa_alloc_cyclic_bh() or
> xa_alloc_cyclic_irq().
> 
> I used the following commands:
> 
>     rdma link add rxe0 type rxe netdev enp6s0 # andromeda, softRoCE
>     mount //192.168.6.1/scratch /xfstest.scratch -o user=shares,rdma,pass=...
> 
> talking to ksmbd on the other side.

It seems a known bug. Please make tests with the patches in the 
attachment. If not work, please let me know.

Thanks a lot.
Zhu Yanjun

> 
> Kernel is v5.18-rc6.
> 
> David
> ---
> infiniband rxe0: set active
> infiniband rxe0: added enp6s0
> RDS/IB: rxe0: added
> CIFS: No dialect specified on mount. Default has changed to a more secure dialect, SMB2.1 or later (e.g. SMB3.1.1), from CIFS (SMB1). To use the less secure SMB1 dialect to access old servers which do not support SMB3.1.1 (or even SMB3 or SMB2.1) specify vers=1.0 on mount.
> CIFS: Attempting to mount \\192.168.6.1\scratch
> 
> ================================
> WARNING: inconsistent lock state
> 5.18.0-rc6-build2+ #465 Not tainted
> --------------------------------
> inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
> ksoftirqd/1/20 [HC0[0]:SC1[1]:HE0:SE0] takes:
> ffff888134d11310 (&xa->xa_lock#12){+.?.}-{2:2}, at: rxe_pool_get_index+0x19/0x69
> {SOFTIRQ-ON-W} state was registered at:
>    mark_usage+0x169/0x17b
>    __lock_acquire+0x50c/0x96a
>    lock_acquire+0x2f4/0x37b
>    _raw_spin_lock+0x2f/0x39
>    xa_alloc_cyclic.constprop.0+0x20/0x55
>    __rxe_add_to_pool+0xe3/0xf2
>    __ib_alloc_pd+0xa2/0x26b
>    ib_mad_port_open+0x1ac/0x4a1
>    ib_mad_init_device+0x9b/0x1b9
>    add_client_context+0x133/0x1b3
>    enable_device_and_get+0x129/0x248
>    ib_register_device+0x256/0x2fd
>    rxe_register_device+0x18e/0x1b7
>    rxe_net_add+0x57/0x71
>    rxe_newlink+0x71/0x8e
>    nldev_newlink+0x200/0x26a
>    rdma_nl_rcv_msg+0x260/0x2ab
>    rdma_nl_rcv+0x108/0x1a7
>    netlink_unicast+0x1fc/0x2b3
>    netlink_sendmsg+0x4ce/0x51b
>    sock_sendmsg_nosec+0x41/0x4f
>    __sys_sendto+0x157/0x1cc
>    __x64_sys_sendto+0x76/0x82
>    do_syscall_64+0x39/0x46
>    entry_SYSCALL_64_after_hwframe+0x44/0xae
> irq event stamp: 194111
> hardirqs last  enabled at (194110): [<ffffffff81094eb2>] __local_bh_enable_ip+0xb8/0xcc
> hardirqs last disabled at (194111): [<ffffffff82040077>] _raw_spin_lock_irqsave+0x1b/0x51
> softirqs last  enabled at (194100): [<ffffffff8240043a>] __do_softirq+0x43a/0x489
> softirqs last disabled at (194105): [<ffffffff81094d30>] run_ksoftirqd+0x31/0x56
> 
> other info that might help us debug this:
>   Possible unsafe locking scenario:
> 
>         CPU0
>         ----
>    lock(&xa->xa_lock#12);
>    <Interrupt>
>      lock(&xa->xa_lock#12);
> 
>   *** DEADLOCK ***
> 
> no locks held by ksoftirqd/1/20.
> 
> stack backtrace:
> CPU: 1 PID: 20 Comm: ksoftirqd/1 Not tainted 5.18.0-rc6-build2+ #465
> Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
> Call Trace:
>   <TASK>
>   dump_stack_lvl+0x45/0x59
>   valid_state+0x56/0x61
>   mark_lock_irq+0x9b/0x2ec
>   ? ret_from_fork+0x1f/0x30
>   ? valid_state+0x61/0x61
>   ? stack_trace_save+0x8f/0xbe
>   ? filter_irq_stacks+0x58/0x58
>   ? jhash.constprop.0+0x1ad/0x202
>   ? save_trace+0x17c/0x196
>   mark_lock.part.0+0x10c/0x164
>   mark_usage+0xe6/0x17b
>   __lock_acquire+0x50c/0x96a
>   lock_acquire+0x2f4/0x37b
>   ? rxe_pool_get_index+0x19/0x69
>   ? rcu_read_unlock+0x52/0x52
>   ? jhash.constprop.0+0x1ad/0x202
>   ? lockdep_unlock+0xde/0xe6
>   ? validate_chain+0x44a/0x4a8
>   ? req_next_wqe+0x312/0x363
>   _raw_spin_lock_irqsave+0x41/0x51
>   ? rxe_pool_get_index+0x19/0x69
>   rxe_pool_get_index+0x19/0x69
>   rxe_get_av+0xbe/0x14b
>   rxe_requester+0x6b5/0xbb0
>   ? rnr_nak_timer+0x16/0x16
>   ? lock_downgrade+0xad/0xad
>   ? rcu_read_lock_bh_held+0xab/0xab
>   ? __wake_up+0xf/0xf
>   ? mark_held_locks+0x1f/0x78
>   ? __local_bh_enable_ip+0xb8/0xcc
>   ? rnr_nak_timer+0x16/0x16
>   rxe_do_task+0xb5/0x13d
>   ? rxe_detach_mcast+0x1d6/0x1d6
>   tasklet_action_common.constprop.0+0xda/0x145
>   __do_softirq+0x202/0x489
>   ? __irq_exit_rcu+0x108/0x108
>   ? _local_bh_enable+0x1c/0x1c
>   run_ksoftirqd+0x31/0x56
>   smpboot_thread_fn+0x35c/0x376
>   ? sort_range+0x1c/0x1c
>   kthread+0x164/0x173
>   ? kthread_complete_and_exit+0x20/0x20
>   ret_from_fork+0x1f/0x30
>   </TASK>
> CIFS: VFS: RDMA transport established
> 

View attachment "PATCHv6-1-4-RDMA-rxe-Fix-dead-lock-caused-by-__rxe_add_to_pool-interrupted-by-rxe_pool_get_index.patch" of type "text/plain" (21178 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ