[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <10caea5b-9ad1-44ce-9eaf-a0f4023f2017@I-love.SAKURA.ne.jp>
Date: Tue, 16 Dec 2025 23:38:37 +0900
From: Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
To: Jason Gunthorpe <jgg@...dia.com>
Cc: Leon Romanovsky <leon@...nel.org>, Majd Dibbiny <majd@...lanox.com>,
Doug Ledford <dledford@...hat.com>, Yuval Shaia <yshaia@...vell.com>,
Bernard Metzler <bernard.metzler@...ux.dev>,
OFED mailing list <linux-rdma@...r.kernel.org>,
Network Development <netdev@...r.kernel.org>
Subject: [PATCH] RDMA/core: flush gid_cache_wq WQ from disable_device()
syzbot is reporting a net_device refcount leak in RDMA code.
A debug printk() patch reported that ib_enum_roce_netdev() is called for
allocating GID entry but is not called for releasing GID entry.
This result suggests that something is preventing ib_enum_roce_netdev()
from ib_enum_all_roce_netdevs() from netdevice_event_work_handler() from
being called when releasing GID entry.
Commit 03db3a2d81e6 ("IB/core: Add RoCE GID table management") introduced
ib_enum_all_roce_netdevs(), but calling this function asynchronously from
WQ context is racy. I can observe using simple atomic_t counters that there
are sometimes pending netdevice_event() works as of immediately before
clearing DEVICE_REGISTERED flag in disable_device() from
__ib_unregister_device(). If pending works contained ib_enum_roce_netdev()
call for releasing GID entry, this race can result in a net_device refcount
leak.
Therefore, flush pending works immediately before clearing
DEVICE_REGISTERED flag.
Also, since commit 8fe8bacb92f2 ("IB/core: Add ordered workqueue for RoCE
GID management") was intended to ensure that netdev events are processed
in the order netdevice_event() is called, failing to invoke corresponding
event handler due to memory allocation failure is as bad as processing
netdev events in parallel.
Therefore, add __GFP_NOFAIL when allocating memory for a work for netdev
events.
Reported-by: syzbot+881d65229ca4f9ae8c84@...kaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=881d65229ca4f9ae8c84
Fixes: 03db3a2d81e6 ("IB/core: Add RoCE GID table management")
Signed-off-by: Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
---
I haven't confirmed that netdevice_event_work_handler() is called for
releasing GID entry.
But I'd like to try this patch in linux-next tree via my tree for testing.
drivers/infiniband/core/core_priv.h | 1 +
drivers/infiniband/core/device.c | 1 +
drivers/infiniband/core/roce_gid_mgmt.c | 10 ++++++----
3 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index 05102769a918..8355020bb98a 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -142,6 +142,7 @@ int ib_cache_gid_del_all_netdev_gids(struct ib_device *ib_dev, u32 port,
int roce_gid_mgmt_init(void);
void roce_gid_mgmt_cleanup(void);
+void roce_flush_gid_cache_wq(void);
unsigned long roce_gid_type_mask_support(struct ib_device *ib_dev, u32 port);
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 13e8a1714bbd..8638583a64f2 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -1300,6 +1300,7 @@ static void disable_device(struct ib_device *device)
WARN_ON(!refcount_read(&device->refcount));
+ roce_flush_gid_cache_wq();
down_write(&devices_rwsem);
xa_clear_mark(&devices, device->index, DEVICE_REGISTERED);
up_write(&devices_rwsem);
diff --git a/drivers/infiniband/core/roce_gid_mgmt.c b/drivers/infiniband/core/roce_gid_mgmt.c
index a9f2c6b1b29e..79982d448cd2 100644
--- a/drivers/infiniband/core/roce_gid_mgmt.c
+++ b/drivers/infiniband/core/roce_gid_mgmt.c
@@ -661,10 +661,7 @@ static int netdevice_queue_work(struct netdev_event_work_cmd *cmds,
{
unsigned int i;
struct netdev_event_work *ndev_work =
- kmalloc(sizeof(*ndev_work), GFP_KERNEL);
-
- if (!ndev_work)
- return NOTIFY_DONE;
+ kmalloc(sizeof(*ndev_work), GFP_KERNEL | __GFP_NOFAIL);
memcpy(ndev_work->cmds, cmds, sizeof(ndev_work->cmds));
for (i = 0; i < ARRAY_SIZE(ndev_work->cmds) && ndev_work->cmds[i].cb; i++) {
@@ -948,3 +945,8 @@ void __exit roce_gid_mgmt_cleanup(void)
*/
destroy_workqueue(gid_cache_wq);
}
+
+void roce_flush_gid_cache_wq(void)
+{
+ flush_workqueue(gid_cache_wq);
+}
--
2.47.3
Powered by blists - more mailing lists