[<prev] [next>] [day] [month] [year] [list]
Message-ID: <2c5257a7-e330-4983-8447-3e217b616b2e@quicinc.com>
Date: Tue, 17 Jun 2025 01:44:56 +0530
From: Sharath Chandra Vurukala <quic_sharathv@...cinc.com>
To: "Jason A. Donenfeld\"" <Jason@...c4.com>, <wireguard@...ts.zx2c4.com>,
<netdev@...r.kernel.org>
Subject: Issue with rtnl_lock in wg_pm_notification
Hi All,
I work on rmnet driver(more details at https://docs.kernel.org/networking/device_drivers/cellular/qualcomm/rmnet.html), This driver registers onto any physical network device in IP mode.
This driver registers/deregisters for PM notifications based on the NETDEV_REGISTER/NETDEV_UNREGISTER notifications received for the physical netdevice.
In one of the tests, involving a frequent system suspend/resume and physical netdevice registration/de-registration, a deadlock has been observed involving the rtnl_lock and the pm_chain_head->rwsem.
The sequence involves
Thread1: unregister_netdev(holds rtnl_lock)-> notification sent to rmnet driver( netdev_chain) -> rmnet driver attempts to unregister for pm_notification( involves acquiring pm_chain_head->rwsem).
Thread2: pm_suspend -> pm_notofier_call_chain(holds pm_chain_head->rwsem)-> wg_pm_notifcation(trying to acquire rtnl_lock)
I do not understand fully what wireguard functionality is, but considering that rtnl_lock is a global one, it does not seem to be a good design to have notification callback acquire this lock.
Can we check if rthl_lock be avoided here OR
Use rtnl_trylock instead of using rtnl_lock and consider deferring the work to a context where it is safe to acquire the lock( may be a workqueue), something like below to avoid the deadlock?
static int wg_pm_notification(struct notifier_block *nb, unsigned long action, void *data)
{
struct wg_device *wg;
struct wg_peer *peer;
/* If the machine is constantly suspending and resuming, as part of
* its normal operation rather than as a somewhat rare event, then we
* don't actually want to clear keys.
*/
if (IS_ENABLED(CONFIG_PM_AUTOSLEEP) ||
IS_ENABLED(CONFIG_PM_USERSPACE_AUTOSLEEP))
return 0;
if (action != PM_HIBERNATION_PREPARE && action != PM_SUSPEND_PREPARE)
return 0;
if (rtnl_trylock()) {
list_for_each_entry(wg, &device_list, device_list) {
mutex_lock(&wg->device_update_lock);
list_for_each_entry(peer, &wg->peer_list, peer_list) {
del_timer(&peer->timer_zero_key_material);
wg_noise_handshake_clear(&peer->handshake);
wg_noise_keypairs_clear(&peer->keypairs);
}
mutex_unlock(&wg->device_update_lock);
}
} else {
/* schedule a workqueue to delete timers clear up the keypairs of the peers? */
}
rtnl_unlock();
rcu_barrier();
}
Thanks,
Sharath
Powered by blists - more mailing lists