[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y4Xq4RAopEUEWrZU@unreal>
Date: Tue, 29 Nov 2022 13:20:01 +0200
From: Leon Romanovsky <leon@...nel.org>
To: Jiri Pirko <jiri@...nulli.us>
Cc: Ido Schimmel <idosch@...sch.org>, Jakub Kicinski <kuba@...nel.org>,
Yang Yingliang <yangyingliang@...wei.com>,
netdev@...r.kernel.org, jiri@...dia.com, davem@...emloft.net,
edumazet@...gle.com, pabeni@...hat.com
Subject: Re: [PATCH net] net: devlink: fix UAF in
devlink_compat_running_version()
On Tue, Nov 29, 2022 at 10:05:10AM +0100, Jiri Pirko wrote:
> Tue, Nov 29, 2022 at 09:44:48AM CET, leon@...nel.org wrote:
> >On Mon, Nov 28, 2022 at 02:52:00PM +0100, Jiri Pirko wrote:
> >> Mon, Nov 28, 2022 at 12:50:15PM CET, leon@...nel.org wrote:
> >> >On Mon, Nov 28, 2022 at 10:58:58AM +0100, Jiri Pirko wrote:
> >> >> Mon, Nov 28, 2022 at 10:20:53AM CET, idosch@...sch.org wrote:
> >> >> >On Wed, Nov 23, 2022 at 06:18:00PM -0800, Jakub Kicinski wrote:
> >> >> >> On Wed, 23 Nov 2022 21:18:14 +0200 Ido Schimmel wrote:
> >> >> >> > > I used the fix code proposed by Jakub, but it didn't work correctly, so
> >> >> >> > > I tried to correct and improve it, and need some devlink helper.
> >> >> >> > >
> >> >> >> > > Anyway, it is a nsim problem, if we want fix this without touch devlink,
> >> >> >> > > I think we can add a 'registered' field in struct nsim_dev, and it can be
> >> >> >> > > checked in nsim_get_devlink_port() like this:
> >> >> >> >
> >> >> >> > I read the discussion and it's not clear to me why this is a netdevsim
> >> >> >> > specific problem. The fundamental problem seems to be that it is
> >> >> >> > possible to hold a reference on a devlink instance before it's
> >> >> >> > registered and that devlink_free() will free the instance regardless of
> >> >> >> > its current reference count because it expects devlink_unregister() to
> >> >> >> > block. In this case, the instance was never registered, so
> >> >> >> > devlink_unregister() is not called.
> >> >> >> >
> >> >> >> > ethtool was able to get a reference on the devlink instance before it
> >> >> >> > was registered because netdevsim registers its netdevs before
> >> >> >> > registering its devlink instance. However, netdevsim is not the only one
> >> >> >> > doing this: funeth, ice, prestera, mlx4, mlxsw, nfp and potentially
> >> >> >> > others do the same thing.
> >> >> >> >
> >> >> >> > When you think about it, it's strange that it's even possible for
> >> >> >> > ethtool to reach the driver when the netdev used in the request is long
> >> >> >> > gone, but it's not holding a reference on the netdev (it's holding a
> >> >> >> > reference on the devlink instance instead) and
> >> >> >> > devlink_compat_running_version() is called without RTNL.
> >> >> >>
> >> >> >> Indeed. We did a bit of a flip-flop with the devlink locking rules
> >> >> >> and the fact that the instance is reachable before it is registered
> >> >> >> is a leftover from a previous restructuring :(
> >> >> >>
> >> >> >> Hence my preference to get rid of the ordering at the driver level
> >> >> >> than to try to patch it up in the code. Dunno if that's convincing.
> >> >> >
> >> >> >I don't have a good solution, but changing all the drivers to register
> >> >> >their netdevs after the devlink instance is going to be quite painful
> >> >> >and too big for 'net'. I feel like the main motivation for this is the
> >> >> >ethtool compat stuff, which is not very convincing IMO. I'm quite happy
> >> >> >with the current flow where drivers call devlink_register() at the end
> >> >> >of their probe.
> >> >> >
> >> >> >Regarding a solution for the current crash, assuming we agree it's not a
> >> >> >netdevsim specific problem, I think the current fix [1] is OK. Note that
> >> >> >while it fixes the crash, it potentially creates other (less severe)
> >> >> >problems. After user space receives RTM_NEWLINK notification it will
> >> >> >need to wait for a certain period of time before issuing
> >> >> >'ETHTOOL_GDRVINFO' as otherwise it will not get the firmware version. I
> >> >> >guess it's not a big deal for drivers that only register one netdev
> >> >> >since they will very quickly follow with devlink_register(), but the
> >> >> >race window is larger for drivers that need to register many netdevs,
> >> >> >for either physical switch or eswitch ports.
> >> >> >
> >> >> >Long term, we either need to find a way to make the ethtool compat stuff
> >> >> >work correctly or just get rid of it and have affected drivers implement
> >> >> >the relevant ethtool operations instead of relying on devlink.
> >> >> >
> >> >> >[1] https://lore.kernel.org/netdev/20221122121048.776643-1-yangyingliang@huawei.com/
> >> >>
> >> >> I just had a call with Ido. We both think that this might be a good
> >> >> solution for -net to avoid the use after free.
> >> >>
> >> >> For net-next, we eventually should change driver init flows to register
> >> >> devlink instance first and only after that register devlink_port and
> >> >> related netdevice. The ordering is important for the userspace app. For
> >> >> example the init flow:
> >> >> <- RTnetlink new netdev event
> >> >> app sees devlink_port handle in IFLA_DEVLINK_PORT
> >> >> -> query devlink instance using this handle
> >> >> <- ENODEV
> >> >>
> >> >> The instance is not registered yet.
> >> >
> >> >This is supposed to be handled by devlink_notify_register() which sends
> >> >"delayed" notifications after devlink_register() is called.
> >> >
> >> >Unless something is broken, the scenario above shouldn't happen.
> >>
> >> Nope, RTnetlink message for new netdev is not handled by that. It is
> >> sent right away.
> >
> >And why don't you fix your new commit dca56c3038c3 ("net: expose devlink port over rtnetlink")
> >to do not return devlink instance unless it is registered?
> >
> >Why is it correct to expose devlink port with not ready to use devlink
> >instance?
>
> It is not, but:
> Devlink port which is "parent" of the netdev is registered. The netdev
> is created with devlink_port registered and that it guaranteed to not
> change during netdev lifetime. Therefore, it would be weird to have 2
> RTnetlink events:
> 1. event of netdev being created without devlink port
> 2. event of netdev with devlink port
> If that is what you suggest.
Yes, something like that in similar way to IFLA_EVENT_NOTIFY_PEERS.
>
> I'm working on a patchset that is making sure that the flow is always
> 1) devlink_register & netlink event
> 2) devlink_port_register & netlink event
> 3) netdev_register & netlink event
>
> Always the same. That means during init, during reload, during port
> split.
The thing that worries me is unclear lifetime of these devlink objects.
devlink_port is aligned with netdev lifetime, devlink is with device one.
It is very strange to me that we have netdev ready without device
underneath ready.
Can you please document the lifetime models and how all these objects
interact?
Thanks
Powered by blists - more mailing lists