lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 19 Oct 2020 13:23:23 +0000
From:   Parav Pandit <parav@...dia.com>
To:     Jason Gunthorpe <jgg@...dia.com>, Leon Romanovsky <leon@...nel.org>
CC:     Doug Ledford <dledford@...hat.com>,
        Jakub Kicinski <kuba@...nel.org>,
        "Jiri Pirko" <jiri@...lanox.com>,
        "linux-rdma@...r.kernel.org" <linux-rdma@...r.kernel.org>,
        Michael Guralnik <michaelgur@...lanox.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Saeed Mahameed <saeedm@...dia.com>
Subject: RE: [PATCH rdma-rc] RDMA/mlx5: Fix devlink deadlock on net namespace
 deletion


> From: Jason Gunthorpe <jgg@...dia.com>
> Sent: Monday, October 19, 2020 6:38 PM
> 
> On Mon, Oct 19, 2020 at 08:27:36AM +0300, Leon Romanovsky wrote:
> > From: Parav Pandit <parav@...dia.com>
> >
> > When a mlx5 core devlink instance is reloaded in different net
> > namespace, its associated IB device is deleted and recreated.
> >
> > Example sequence is:
> > $ ip netns add foo
> > $ devlink dev reload pci/0000:00:08.0 netns foo $ ip netns del foo
> >
> > mlx5 IB device needs to attach and detach the netdevice to it through
> > the netdev notifier chain during load and unload sequence.
> > A below call graph of the unload flow.
> >
> > cleanup_net()
> >    down_read(&pernet_ops_rwsem); <- first sem acquired
> >      ops_pre_exit_list()
> >        pre_exit()
> >          devlink_pernet_pre_exit()
> >            devlink_reload()
> >              mlx5_devlink_reload_down()
> >                mlx5_unload_one()
> >                [...]
> >                  mlx5_ib_remove()
> >                    mlx5_ib_unbind_slave_port()
> >                      mlx5_remove_netdev_notifier()
> >                        unregister_netdevice_notifier()
> >                          down_write(&pernet_ops_rwsem);<- recurrsive
> > lock
> >
> > Hence, when net namespace is deleted, mlx5 reload results in deadlock.
> >
> > When deadlock occurs, devlink mutex is also held. This not only
> > deadlocks the mlx5 device under reload, but all the processes which
> > attempt to access unrelated devlink devices are deadlocked.
> >
> > Hence, fix this by mlx5 ib driver to register for per net netdev
> > notifier instead of global one, which operats on the net namespace
> > without holding the pernet_ops_rwsem.
> >
> > Fixes: 4383cfcc65e7 ("net/mlx5: Add devlink reload")
> > Signed-off-by: Parav Pandit <parav@...dia.com>
> > Signed-off-by: Leon Romanovsky <leonro@...dia.com>
> >  drivers/infiniband/hw/mlx5/main.c                  | 6 ++++--
> >  drivers/net/ethernet/mellanox/mlx5/core/lib/mlx5.h | 5 -----
> >  include/linux/mlx5/driver.h                        | 5 +++++
> >  3 files changed, 9 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/infiniband/hw/mlx5/main.c
> > b/drivers/infiniband/hw/mlx5/main.c
> > index 944bb7691913..b1b3e563c15e 100644
> > +++ b/drivers/infiniband/hw/mlx5/main.c
> > @@ -3323,7 +3323,8 @@ static int mlx5_add_netdev_notifier(struct
> mlx5_ib_dev *dev, u8 port_num)
> >  	int err;
> >
> >  	dev->port[port_num].roce.nb.notifier_call = mlx5_netdev_event;
> > -	err = register_netdevice_notifier(&dev->port[port_num].roce.nb);
> > +	err = register_netdevice_notifier_net(mlx5_core_net(dev->mdev),
> > +					      &dev->port[port_num].roce.nb);
> 
> This looks racy, what lock needs to be held to keep *mlx5_core_net() stable?

mlx5_core_net() cannot be accessed outside of mlx5 driver's load, unload, reload path.

When this is getting executed, devlink cannot be executing reload.
This is guarded by devlink_reload_enable/disable calls done by mlx5 core.

> 
> >  	if (err) {
> >  		dev->port[port_num].roce.nb.notifier_call = NULL;
> >  		return err;
> > @@ -3335,7 +3336,8 @@ static int mlx5_add_netdev_notifier(struct
> > mlx5_ib_dev *dev, u8 port_num)  static void
> > mlx5_remove_netdev_notifier(struct mlx5_ib_dev *dev, u8 port_num)  {
> >  	if (dev->port[port_num].roce.nb.notifier_call) {
> > -		unregister_netdevice_notifier(&dev-
> >port[port_num].roce.nb);
> > +		unregister_netdevice_notifier_net(mlx5_core_net(dev-
> >mdev),
> > +						  &dev-
> >port[port_num].roce.nb);
> 
> This seems dangerous too, what if the mlx5_core_net changed before we
> get here?
> 
When I inspected driver, code, I am not aware of any code flow where this can
change before reaching here, because registration and unregistratio is done only in driver load, unload and reload path.
Reload can happen only after devlink_reload_enable() is done.

> What are the rules for when devlink_net() changes?
> 
devlink_net() changes only after unload() callback is completed in driver.

Powered by blists - more mailing lists