netdev - Issue with rtnl_lock in wg_pm

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <2c5257a7-e330-4983-8447-3e217b616b2e@quicinc.com>
Date: Tue, 17 Jun 2025 01:44:56 +0530
From: Sharath Chandra Vurukala <quic_sharathv@...cinc.com>
To: "Jason A. Donenfeld\"" <Jason@...c4.com>, <wireguard@...ts.zx2c4.com>,
        <netdev@...r.kernel.org>
Subject: Issue with rtnl_lock in wg_pm_notification

Hi All,
I work on rmnet driver(more details at https://docs.kernel.org/networking/device_drivers/cellular/qualcomm/rmnet.html), This driver registers onto any physical network device in IP mode.
This driver registers/deregisters for PM notifications based on the NETDEV_REGISTER/NETDEV_UNREGISTER notifications received for the physical netdevice.

In one of the tests, involving a frequent system suspend/resume and physical netdevice registration/de-registration, a deadlock has been observed involving the rtnl_lock and the pm_chain_head->rwsem.
The sequence involves
Thread1: unregister_netdev(holds rtnl_lock)-> notification sent to rmnet driver( netdev_chain) -> rmnet driver attempts to unregister for pm_notification( involves acquiring pm_chain_head->rwsem).
Thread2: pm_suspend -> pm_notofier_call_chain(holds pm_chain_head->rwsem)-> wg_pm_notifcation(trying to acquire rtnl_lock)

I do not understand fully what wireguard functionality is, but considering that rtnl_lock is a global one, it does not seem to be a good design to have notification callback acquire this lock.
Can we check if rthl_lock be avoided here OR 
Use rtnl_trylock instead of using rtnl_lock and consider deferring the work to a context where it is safe to acquire the lock( may be a workqueue), something like below to avoid the deadlock?

static int wg_pm_notification(struct notifier_block *nb, unsigned long action, void *data)
{
          struct wg_device *wg;
          struct wg_peer *peer;

          /* If the machine is constantly suspending and resuming, as part of
           * its normal operation rather than as a somewhat rare event, then we
           * don't actually want to clear keys.
           */
          if (IS_ENABLED(CONFIG_PM_AUTOSLEEP) ||
              IS_ENABLED(CONFIG_PM_USERSPACE_AUTOSLEEP))
                          return 0;

          if (action != PM_HIBERNATION_PREPARE && action != PM_SUSPEND_PREPARE)
                          return 0;

          if (rtnl_trylock()) {
              list_for_each_entry(wg, &device_list, device_list) {
                              mutex_lock(&wg->device_update_lock);
                              list_for_each_entry(peer, &wg->peer_list, peer_list) {
                                              del_timer(&peer->timer_zero_key_material);
                                              wg_noise_handshake_clear(&peer->handshake);
                                              wg_noise_keypairs_clear(&peer->keypairs);
                              }
                              mutex_unlock(&wg->device_update_lock);
              }

          } else {

                /* schedule a workqueue to delete timers clear up the keypairs of the peers?  */

          }

          rtnl_unlock();
          rcu_barrier();
}

Thanks,
Sharath