netdev - [PATCH] RDMA/core: flush gid_cache_wq WQ from disable

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <10caea5b-9ad1-44ce-9eaf-a0f4023f2017@I-love.SAKURA.ne.jp>
Date: Tue, 16 Dec 2025 23:38:37 +0900
From: Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
To: Jason Gunthorpe <jgg@...dia.com>
Cc: Leon Romanovsky <leon@...nel.org>, Majd Dibbiny <majd@...lanox.com>,
        Doug Ledford <dledford@...hat.com>, Yuval Shaia <yshaia@...vell.com>,
        Bernard Metzler <bernard.metzler@...ux.dev>,
        OFED mailing list <linux-rdma@...r.kernel.org>,
        Network Development <netdev@...r.kernel.org>
Subject: [PATCH] RDMA/core: flush gid_cache_wq WQ from disable_device()

syzbot is reporting a net_device refcount leak in RDMA code.
A debug printk() patch reported that ib_enum_roce_netdev() is called for
allocating GID entry but is not called for releasing GID entry.
This result suggests that something is preventing ib_enum_roce_netdev()
 from ib_enum_all_roce_netdevs() from netdevice_event_work_handler() from
being called when releasing GID entry.

Commit 03db3a2d81e6 ("IB/core: Add RoCE GID table management") introduced
ib_enum_all_roce_netdevs(), but calling this function asynchronously from
WQ context is racy. I can observe using simple atomic_t counters that there
are sometimes pending netdevice_event() works as of immediately before
clearing DEVICE_REGISTERED flag in disable_device() from
__ib_unregister_device(). If pending works contained ib_enum_roce_netdev()
call for releasing GID entry, this race can result in a net_device refcount
leak.

Therefore, flush pending works immediately before clearing
DEVICE_REGISTERED flag.

Also, since commit 8fe8bacb92f2 ("IB/core: Add ordered workqueue for RoCE
GID management") was intended to ensure that netdev events are processed
in the order netdevice_event() is called, failing to invoke corresponding
event handler due to memory allocation failure is as bad as processing
netdev events in parallel.

Therefore, add __GFP_NOFAIL when allocating memory for a work for netdev
events.

Reported-by: syzbot+881d65229ca4f9ae8c84@...kaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=881d65229ca4f9ae8c84
Fixes: 03db3a2d81e6 ("IB/core: Add RoCE GID table management")
Signed-off-by: Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
---
I haven't confirmed that netdevice_event_work_handler() is called for
releasing GID entry.
But I'd like to try this patch in linux-next tree via my tree for testing.

 drivers/infiniband/core/core_priv.h     |  1 +
 drivers/infiniband/core/device.c        |  1 +
 drivers/infiniband/core/roce_gid_mgmt.c | 10 ++++++----
 3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index 05102769a918..8355020bb98a 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -142,6 +142,7 @@ int ib_cache_gid_del_all_netdev_gids(struct ib_device *ib_dev, u32 port,

 int roce_gid_mgmt_init(void);
 void roce_gid_mgmt_cleanup(void);
+void roce_flush_gid_cache_wq(void);

 unsigned long roce_gid_type_mask_support(struct ib_device *ib_dev, u32 port);

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 13e8a1714bbd..8638583a64f2 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -1300,6 +1300,7 @@ static void disable_device(struct ib_device *device)

 	WARN_ON(!refcount_read(&device->refcount));

+	roce_flush_gid_cache_wq();
 	down_write(&devices_rwsem);
 	xa_clear_mark(&devices, device->index, DEVICE_REGISTERED);
 	up_write(&devices_rwsem);
diff --git a/drivers/infiniband/core/roce_gid_mgmt.c b/drivers/infiniband/core/roce_gid_mgmt.c
index a9f2c6b1b29e..79982d448cd2 100644
--- a/drivers/infiniband/core/roce_gid_mgmt.c
+++ b/drivers/infiniband/core/roce_gid_mgmt.c
@@ -661,10 +661,7 @@ static int netdevice_queue_work(struct netdev_event_work_cmd *cmds,
 {
 	unsigned int i;
 	struct netdev_event_work *ndev_work =
-		kmalloc(sizeof(*ndev_work), GFP_KERNEL);
-
-	if (!ndev_work)
-		return NOTIFY_DONE;
+		kmalloc(sizeof(*ndev_work), GFP_KERNEL | __GFP_NOFAIL);

 	memcpy(ndev_work->cmds, cmds, sizeof(ndev_work->cmds));
 	for (i = 0; i < ARRAY_SIZE(ndev_work->cmds) && ndev_work->cmds[i].cb; i++) {
@@ -948,3 +945,8 @@ void __exit roce_gid_mgmt_cleanup(void)
 	 */
 	destroy_workqueue(gid_cache_wq);
 }
+
+void roce_flush_gid_cache_wq(void)
+{
+	flush_workqueue(gid_cache_wq);
+}
-- 
2.47.3