lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <tencent_713807A8D67394A5D8339F8AD33FCCBFCE07@qq.com>
Date: Tue, 16 Dec 2025 17:59:52 +0800
From: wujing <realwujing@...com>
To: jgg@...pe.ca
Cc: leon@...nel.org,
	linux-kernel@...r.kernel.org,
	linux-rdma@...r.kernel.org,
	realwujing@...com,
	yuanql9@...natelecom.cn
Subject: Re: [PATCH] IB/core: Fix ABBA deadlock in rdma_dev_exit_net

Hi Jason,

You're right that the locks aren't nested in rdma_dev_exit_net() - it does release 
rdma_nets_rwsem before acquiring devices_rwsem. However, this is still an ABBA deadlock,
just not the trivial nested kind. The issue is caused by **rwsem writer priority**
and lock ordering inconsistency.

Here's the actual deadlock scenario:

**Thread A (rdma_dev_exit_net - cleanup_net workqueue):**
```
down_write(&rdma_nets_rwsem);    // Acquired
xa_store(&rdma_nets, ...);
up_write(&rdma_nets_rwsem);      // Released
down_read(&devices_rwsem);       // Waiting here <-- BLOCKED
```

**Thread B (rdma_dev_init_net - stress-ng-clone):**
```
down_read(&devices_rwsem);       // Acquired
down_read(&rdma_nets_rwsem);     // Waiting here <-- BLOCKED
```

The deadlock happens because:

1. Thread A releases rdma_nets_rwsem as a **writer**
2. Thread B (and many others) are waiting to acquire rdma_nets_rwsem as **readers**
3. Thread A then tries to acquire devices_rwsem as a reader
4. BUT: rwsem gives priority to pending writers over new readers
5. Since Thread A was a pending writer on rdma_nets_rwsem, Thread B's read request is blocked
6. Thread B holds devices_rwsem, which Thread A needs
7. Thread A holds the "writer priority slot" on rdma_nets_rwsem, which Thread B needs

This is a **priority inversion deadlock**, not a simple nested lock deadlock.

The production crash log shows exactly this:
- Thread A: `rdma_dev_exit_net+0x60` stuck in `rwsem_down_write_slowpath` trying to get devices_rwsem
- Thread B: `rdma_dev_init_net+0x120` stuck in `rwsem_down_read_slowpath` trying to get rdma_nets_rwsem

Lockdep doesn't catch this because:
1. The locks aren't held simultaneously (no nested locking)
2. It's a reader-writer priority issue, not a simple lock ordering issue
3. It requires specific timing: writer releases lock, then tries to acquire another
lock that readers (waiting for the first lock) already hold

The fix ensures both paths acquire locks in the same order:
- rdma_dev_init_net: devices_rwsem → rdma_nets_rwsem
- rdma_dev_exit_net: devices_rwsem → rdma_nets_rwsem (was reversed)

This eliminates the priority inversion scenario.

Best regards


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ