[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJeEPuJHMKo9T3GcAQH2+X3Rke3b4YH3_S6FmnBp4tQqLciYxA@mail.gmail.com>
Date: Mon, 31 Mar 2025 15:00:00 +0200
From: Dylan Wolff <wolffd@...p.nus.edu.sg>
To: Wenjia Zhang <wenjia@...ux.ibm.com>, Jan Karcher <jaka@...ux.ibm.com>,
"D. Wythe" <alibuda@...ux.alibaba.com>, Tony Lu <tonylu@...ux.alibaba.com>,
Wen Gu <guwen@...ux.alibaba.com>, "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Simon Horman <horms@...nel.org>, linux-rdma@...r.kernel.org, linux-s390@...r.kernel.org,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Cc: Jiacheng Xu <3170103308@....edu.cn>
Subject: Concurrent slab-use-after-free in netdev_next_lower_dev
Hello!
Firstly, I am still relatively new to kernel development, so apologies in
advance if my assessment of this issue is incorrect.
I have a Syzkaller crash report for what looks like a use-after free
concurrency bug with a net_device. I am working on getting a
consistent/minimal reproducer, but for now this bug seems to be quite
difficult to trigger in practice using the attached Syzkaller program.
>From the report, it looks like the net_device is freed at the end of an
rtnl critical section in netdev_run_todo. At the time of the crash, the
*use* thread has acquired rtnl_lock() in smc_vlan_by_tcpsk. The crash
occurred at the line preceded by `>>>` below in 6.13 rc4 while iterating
over devices with netdev_walk_all_lower_dev:
```
static struct net_device *netdev_next_lower_dev(struct net_device *dev,
struct list_head **iter)
{
struct netdev_adjacent *lower;
>>> lower = list_entry((*iter)->next, struct netdev_adjacent, list);
if (&lower->list == &dev->adj_list.lower)
return NULL;
*iter = &lower->list;
return lower->dev;
}
```
This looks to me like it is an issue with reference counting; I see that
netdev_refcnt_read is checked in netdev_run_todo before the device is
freed, but I don't see anything in netdev_walk_all_lower_dev /
netdev_next_lower_dev that is incrementing netdev_refcnt_read when it is
iterating over the devices. I'm guessing the fix is to either add reference
counting to netdev_walk_all_lower_dev or to use a different,
concurrency-safe iterator over the devices in the caller (smc_vlan_by_tcpsk
).
Could someone confirm if I am on the right track here? If so I am happy to
try to come up with the patch.
Environment:
Qemu (invocation attached) running a Syzkaller image on an Ubuntu
22.04.4 LTS host
Kernel:
tag: 6.13 rc4
compiler toolchain: clang-17
Thanks!
Dylan
Content of type "text/html" skipped
View attachment "qemu.txt" of type "text/plain" (425 bytes)
Download attachment "2e50cc6b5eed2cdd8e652711c64739a9a120a405.zip" of type "application/zip" (187330 bytes)
Powered by blists - more mailing lists