[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a6430c7b6af910894ed891e12e18b020d53e8f4e.camel@sipsolutions.net>
Date: Wed, 03 Jul 2024 13:56:17 +0200
From: Johannes Berg <johannes@...solutions.net>
To: Eric Dumazet <edumazet@...gle.com>, syzbot
<syzbot+2120b9a8f96b3fa90bad@...kaller.appspotmail.com>
Cc: "davem@...emloft.net" <davem@...emloft.net>, "Hadi Salim, Jamal"
<jhs@...atatu.com>, "jiri@...nulli.us" <jiri@...nulli.us>,
"kuba@...nel.org" <kuba@...nel.org>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "netdev@...r.kernel.org"
<netdev@...r.kernel.org>, "pabeni@...hat.com" <pabeni@...hat.com>,
"syzkaller-bugs@...glegroups.com" <syzkaller-bugs@...glegroups.com>,
"xiyou.wangcong@...il.com" <xiyou.wangcong@...il.com>
Subject: Re: [syzbot] [net?] WARNING: suspicious RCU usage in dev_activate
On Wed, 2024-07-03 at 10:10 +0000, Eric Dumazet wrote:
> I think this came with this patch :
>
> commit facd15dfd69122042502d99ab8c9f888b48ee994
> Author: Johannes Berg <johannes.berg@...el.com>
> Date: Mon Dec 4 21:47:07 2023 +0100
>
> net: core: synchronize link-watch when carrier is queried
Yes, that makes sense.
> Issue here is that ethtool_op_get_link() could be called from RCU contexts.
That's ... unexpected, and apparently not just to me. First (but not
only) example I found: usbnet_get_link() calls mii_link_ok() calls the
driver's mdio_read(), i.e. sr_mdio_read() which does a mutex_lock().
And it was always broken? I can't really find anything that introduced
this problem directly - even before 4cb4f97b7e36 you had a read_lock()
there, and the bond slave_dev->ethtool_ops->get_link() call goes back to
the beginning of git history, the usbnet example is just slightly newer
from c41286fd42f3, but even before that there are examples in usbnet
drivers with this problem.
> Adding linkwatch_sync_dev() in it broke this case.
Right, I agree that it made the problem much more wide-spread than it
being driver-dependent as it was before.
Perhaps we should change bond? The original commit that added RCU there
said it even considered RTNL instead.
At the very least I'd say some _big_ documentation is needed there in
ethtool, and probably then __ethtool_get_link() should also
rcu_read_lock() to make this consistent. But like I said, I'm not sure
it isn't bond that's on the wrong side here.
> BTW, this commit also made it difficult to convert "ip link" dumps to
> not use RTNL, but rely on RCU instead.
You could probably sync all of linswatch beforehand, and only acquire
the RTNL if there's work to be done at all?
johannes
Powered by blists - more mailing lists