[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d9a7af40-84f9-446e-a708-d989b322a675@redhat.com>
Date: Tue, 1 Apr 2025 09:45:08 +0200
From: Paolo Abeni <pabeni@...hat.com>
To: Dylan Wolff <wolffd@...p.nus.edu.sg>, Wenjia Zhang
<wenjia@...ux.ibm.com>, Jan Karcher <jaka@...ux.ibm.com>,
"D. Wythe" <alibuda@...ux.alibaba.com>, Tony Lu <tonylu@...ux.alibaba.com>,
Wen Gu <guwen@...ux.alibaba.com>, "David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
Simon Horman <horms@...nel.org>, linux-rdma@...r.kernel.org,
linux-s390@...r.kernel.org, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org
Cc: Jiacheng Xu <3170103308@....edu.cn>
Subject: Re: Concurrent slab-use-after-free in netdev_next_lower_dev
On 3/31/25 3:00 PM, Dylan Wolff wrote:
> From the report, it looks like the net_device is freed at the end of an
> rtnl critical section in netdev_run_todo. At the time of the crash, the
> *use* thread has acquired rtnl_lock() in smc_vlan_by_tcpsk. The crash
> occurred at the line preceded by `>>>` below in 6.13 rc4 while iterating
> over devices with netdev_walk_all_lower_dev:
>
> ```
> static struct net_device *netdev_next_lower_dev(struct net_device *dev,
> struct list_head **iter)
> {
> struct netdev_adjacent *lower;
>
>>>> lower = list_entry((*iter)->next, struct netdev_adjacent, list);
>
> if (&lower->list == &dev->adj_list.lower)
> return NULL;
>
> *iter = &lower->list;
>
> return lower->dev;
> }
> ```
Please share the decoded syzkaller/kasan splat in plaintext instead of
describing it in natural language, and please avoid attachments unless
explicitly asked for.
Also, can you reproduce the issue on top of the current net tree?
> This looks to me like it is an issue with reference counting; I see that
> netdev_refcnt_read is checked in netdev_run_todo before the device is
> freed, but I don't see anything in netdev_walk_all_lower_dev /
> netdev_next_lower_dev that is incrementing netdev_refcnt_read when it is
> iterating over the devices. I'm guessing the fix is to either add reference
> counting to netdev_walk_all_lower_dev or to use a different,
> concurrency-safe iterator over the devices in the caller (smc_vlan_by_tcpsk
> ).
>
> Could someone confirm if I am on the right track here? If so I am happy to
> try to come up with the patch.
netdev_walk_all_lower_dev() should not need additional refcounting, as
it is traversing the list under rtnl lock, and device should be removed
from the adjacency list before the actual device free under such lock, too.
Thanks,
Paolo
Powered by blists - more mailing lists