[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <161579751342.3996.7266999681235546580@kwain.local>
Date: Mon, 15 Mar 2021 09:38:33 +0100
From: Antoine Tenart <atenart@...nel.org>
To: Maxim Mykytianskyi <maximmi@...dia.com>,
Saeed Mahameed <saeed@...nel.org>,
Tariq Toukan <tariqt@...dia.com>, alexander.duyck@...il.com,
davem@...emloft.net, kuba@...nel.org
Cc: netdev@...r.kernel.org
Subject: Re: [PATCH net-next v3 15/16] net/mlx5e: take the rtnl lock when calling netif_set_xps_queue
Quoting Saeed Mahameed (2021-03-12 21:54:18)
> On Fri, 2021-03-12 at 16:04 +0100, Antoine Tenart wrote:
> > netif_set_xps_queue must be called with the rtnl lock taken, and this
> > is
> > now enforced using ASSERT_RTNL(). mlx5e_attach_netdev was taking the
> > lock conditionally, fix this by taking the rtnl lock all the time.
> >
> > Signed-off-by: Antoine Tenart <atenart@...nel.org>
> > ---
> > drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 11 +++--------
> > 1 file changed, 3 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > index ec2fcb2a2977..96cba86b9f0d 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> > @@ -5557,7 +5557,6 @@ static void mlx5e_update_features(struct
> > net_device *netdev)
> >
> > int mlx5e_attach_netdev(struct mlx5e_priv *priv)
> > {
> > - const bool take_rtnl = priv->netdev->reg_state ==
> > NETREG_REGISTERED;
> > const struct mlx5e_profile *profile = priv->profile;
> > int max_nch;
> > int err;
> > @@ -5578,15 +5577,11 @@ int mlx5e_attach_netdev(struct mlx5e_priv
> > *priv)
> > * 2. Set our default XPS cpumask.
> > * 3. Build the RQT.
> > *
> > - * rtnl_lock is required by netif_set_real_num_*_queues in case
> > the
> > - * netdev has been registered by this point (if this function
> > was called
> > - * in the reload or resume flow).
> > + * rtnl_lock is required by netif_set_xps_queue.
> > */
>
> There is a reason why it is conditional:
> we had a bug in the past of double locking here:
>
> [ 4255.283960] echo/644 is trying to acquire lock:
>
> [ 4255.285092] ffffffff85101f90 (rtnl_mutex){+..}, at:
> mlx5e_attach_netdev0xd4/0×3d0 [mlx5_core]
>
> [ 4255.287264]
>
> [ 4255.287264] but task is already holding lock:
>
> [ 4255.288971] ffffffff85101f90 (rtnl_mutex){+..}, at:
> ipoib_vlan_add0×7c/0×2d0 [ib_ipoib]
>
> ipoib_vlan_add is called under rtnl and will eventually call
> mlx5e_attach_netdev, we don't have much control over this in mlx5
> driver since the rdma stack provides a per-prepared netdev to attach to
> our hw. maybe it is time we had a nested rtnl lock ..
Thanks for the explanation. So as you said, we can't based the locking
decision only on the driver own state / information...
What about `take_rtnl = !rtnl_is_locked();`?
Thanks!
Antoine
Powered by blists - more mailing lists