netdev - [not-yet-signed PATCH] RDMA/core: flush gid_cache_wq WQ from disable

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1722eff3-14c1-408b-999b-1be3e8fbfe5a@I-love.SAKURA.ne.jp>
Date: Thu, 11 Dec 2025 22:24:59 +0900
From: Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
To: Jason Gunthorpe <jgg@...pe.ca>, Leon Romanovsky <leon@...nel.org>,
        Majd Dibbiny <majd@...lanox.com>, Doug Ledford <dledford@...hat.com>,
        Yuval Shaia <yuval.shaia@...cle.com>
Cc: Bernard Metzler <bernard.metzler@...ux.dev>,
        OFED mailing list <linux-rdma@...r.kernel.org>,
        Network Development <netdev@...r.kernel.org>
Subject: [not-yet-signed PATCH] RDMA/core: flush gid_cache_wq WQ from
 disable_device()

syzbot is reporting a net_device refcount leak in RDMA code.
A debug printk() patch in next-20251204 reported that there is a refcount
leak in ib_gid_table_entry handling. Another debug printk() patch in
next-20251210 reported that netdevice_event_work_handler() is called for
allocating GID entry but is not called for releasing GID entry.

  unregister_netdevice: waiting for ipvlan0 to become free. Usage count = 5
  Call trace for ipvlan0@...f888076d9da00 +1 at
       alloc_gid_entry drivers/infiniband/core/cache.c:410 [inline]
       add_modify_gid+0x317/0xcc0 drivers/infiniband/core/cache.c:550
       __ib_cache_gid_add+0x230/0x370 drivers/infiniband/core/cache.c:681
       ib_cache_gid_set_default_gid+0x5f9/0x710 drivers/infiniband/core/cache.c:960
       ib_enum_roce_netdev+0x1ab/0x2e0 drivers/infiniband/core/device.c:2451
       ib_enum_all_roce_netdevs+0xcc/0x160 drivers/infiniband/core/device.c:2477
       netdevice_event_work_handler+0xef/0x260 drivers/infiniband/core/roce_gid_mgmt.c:660
       process_one_work+0x93a/0x15a0 kernel/workqueue.c:3279
  Call trace for ipvlan0@...f888076d9de00 +1 at
       alloc_gid_entry drivers/infiniband/core/cache.c:410 [inline]
       add_modify_gid+0x317/0xcc0 drivers/infiniband/core/cache.c:550
       __ib_cache_gid_add+0x230/0x370 drivers/infiniband/core/cache.c:681
       update_gid drivers/infiniband/core/roce_gid_mgmt.c:110 [inline]
       update_gid_ip drivers/infiniband/core/roce_gid_mgmt.c:294 [inline]
       enum_netdev_ipv4_ips drivers/infiniband/core/roce_gid_mgmt.c:368 [inline]
       _add_netdev_ips+0x98c/0x1560 drivers/infiniband/core/roce_gid_mgmt.c:424
       ib_enum_roce_netdev+0x1ab/0x2e0 drivers/infiniband/core/device.c:2451
       ib_enum_all_roce_netdevs+0xcc/0x160 drivers/infiniband/core/device.c:2477
       netdevice_event_work_handler+0xef/0x260 drivers/infiniband/core/roce_gid_mgmt.c:660
       process_one_work+0x93a/0x15a0 kernel/workqueue.c:3279
  Call trace for ipvlan0@...f888031e4eb00 +1 at
       alloc_gid_entry drivers/infiniband/core/cache.c:410 [inline]
       add_modify_gid+0x317/0xcc0 drivers/infiniband/core/cache.c:550
       __ib_cache_gid_add+0x230/0x370 drivers/infiniband/core/cache.c:681
       update_gid drivers/infiniband/core/roce_gid_mgmt.c:110 [inline]
       enum_netdev_ipv6_ips drivers/infiniband/core/roce_gid_mgmt.c:415 [inline]
       _add_netdev_ips+0x12d9/0x1560 drivers/infiniband/core/roce_gid_mgmt.c:426
       ib_enum_roce_netdev+0x1ab/0x2e0 drivers/infiniband/core/device.c:2451
       ib_enum_all_roce_netdevs+0xcc/0x160 drivers/infiniband/core/device.c:2477
       netdevice_event_work_handler+0xef/0x260 drivers/infiniband/core/roce_gid_mgmt.c:660
       process_one_work+0x93a/0x15a0 kernel/workqueue.c:3279
  Call trace for ipvlan0@...f888076d9da00 +1 at
       get_gid_entry drivers/infiniband/core/cache.c:435 [inline]
       rdma_get_gid_attr+0x2ee/0x3f0 drivers/infiniband/core/cache.c:1300
       smc_ib_fill_mac net/smc/smc_ib.c:160 [inline]
       smc_ib_remember_port_attr net/smc/smc_ib.c:369 [inline]
       smc_ib_port_event_work+0x196/0x940 net/smc/smc_ib.c:388
       process_one_work+0x93a/0x15a0 kernel/workqueue.c:3279
  Call trace for ipvlan0@...f888076d9da00 -1 at
       put_gid_entry drivers/infiniband/core/cache.c:441 [inline]
       rdma_put_gid_attr+0x7c/0x130 drivers/infiniband/core/cache.c:1381
       smc_ib_fill_mac net/smc/smc_ib.c:165 [inline]
       smc_ib_remember_port_attr net/smc/smc_ib.c:369 [inline]
       smc_ib_port_event_work+0x1d4/0x940 net/smc/smc_ib.c:388
       process_one_work+0x93a/0x15a0 kernel/workqueue.c:3279
  balance for ipvlan0@...gid_table_entry is 3

If netdevice_event_work_handler() is supposed to be called for releasing
GID entry upon NETDEV_UNREGISTER event, we can consider that something is
preventing ib_enum_all_roce_netdevs() from being called. And I found
possible race window explained below.

Since ib_enum_all_roce_netdevs() uses xa_for_each_marked(DEVICE_REGISTERED)
with devices_rwsem held for read, we need to ensure that all works queued
by netdevice_event(NETDEV_UNREGISTER) completes before disable_device()
calls xa_clear_mark(DEVICE_REGISTERED) with devices_rwsem held for write.
Otherwise, ib_enum_all_roce_netdevs() will fail to find devices for
NETDEV_UNREGISTER event (which is needed for dropping a refcount on
ib_gid_table_entry which is holding a refcount on net_device).

Since flush_workqueue(gid_cache_wq) is not called before disable_device()
calls xa_clear_mark(), and commit 8fe8bacb92f2 ("IB/core: Add ordered
workqueue for RoCE GID management") introduced gid_cache_wq as ordered,
possibility of failing to complete some of works before xa_clear_mark() is
called might not be negligible. Therefore, flush gid_cache_wq WQ before
disable_device() calls xa_clear_mark().

Also, add __GFP_NOFAIL when allocating memory for a work for netdev events.
Since that commit is intended to ensure that netdev events are processed
in the order netdevice_event() is called, failing to invoke corresponding
event handler due to memory allocation failure is as bad as processing
netdev events in parallel.
---
Since a reproducer for this bug is not available, I haven't verified
whether this is a bug syzbot is currently reporting in
https://syzkaller.appspot.com/bug?extid=881d65229ca4f9ae8c84 .
But I'd like to add Reported-by: syzbot if netdevice_event_work_handler()
is supposed to be called for releasing GID entry upon NETDEV_UNREGISTER
event. Thus, please review this change.

 drivers/infiniband/core/core_priv.h     |  1 +
 drivers/infiniband/core/device.c        |  1 +
 drivers/infiniband/core/roce_gid_mgmt.c | 10 ++++++----
 3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index 05102769a918..8355020bb98a 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -142,6 +142,7 @@ int ib_cache_gid_del_all_netdev_gids(struct ib_device *ib_dev, u32 port,
 
 int roce_gid_mgmt_init(void);
 void roce_gid_mgmt_cleanup(void);
+void roce_flush_gid_cache_wq(void);
 
 unsigned long roce_gid_type_mask_support(struct ib_device *ib_dev, u32 port);
 
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 13e8a1714bbd..8638583a64f2 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -1300,6 +1300,7 @@ static void disable_device(struct ib_device *device)
 
 	WARN_ON(!refcount_read(&device->refcount));
 
+	roce_flush_gid_cache_wq();
 	down_write(&devices_rwsem);
 	xa_clear_mark(&devices, device->index, DEVICE_REGISTERED);
 	up_write(&devices_rwsem);
diff --git a/drivers/infiniband/core/roce_gid_mgmt.c b/drivers/infiniband/core/roce_gid_mgmt.c
index a9f2c6b1b29e..79982d448cd2 100644
--- a/drivers/infiniband/core/roce_gid_mgmt.c
+++ b/drivers/infiniband/core/roce_gid_mgmt.c
@@ -661,10 +661,7 @@ static int netdevice_queue_work(struct netdev_event_work_cmd *cmds,
 {
 	unsigned int i;
 	struct netdev_event_work *ndev_work =
-		kmalloc(sizeof(*ndev_work), GFP_KERNEL);
-
-	if (!ndev_work)
-		return NOTIFY_DONE;
+		kmalloc(sizeof(*ndev_work), GFP_KERNEL | __GFP_NOFAIL);
 
 	memcpy(ndev_work->cmds, cmds, sizeof(ndev_work->cmds));
 	for (i = 0; i < ARRAY_SIZE(ndev_work->cmds) && ndev_work->cmds[i].cb; i++) {
@@ -948,3 +945,8 @@ void __exit roce_gid_mgmt_cleanup(void)
 	 */
 	destroy_workqueue(gid_cache_wq);
 }
+
+void roce_flush_gid_cache_wq(void)
+{
+	flush_workqueue(gid_cache_wq);
+}
-- 
2.47.3