netdev - Re: [Patch net-next] mlx5: use RCU lock in mlx5_eq_cq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <e88ac12b80b510648f7ab1d4cee50c43908ba49d.camel@mellanox.com>
Date:   Wed, 6 Feb 2019 16:55:24 +0000
From:   Saeed Mahameed <saeedm@...lanox.com>
To:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        Tariq Toukan <tariqt@...lanox.com>,
        "xiyou.wangcong@...il.com" <xiyou.wangcong@...il.com>
Subject: Re: [Patch net-next] mlx5: use RCU lock in mlx5_eq_cq_get()

On Wed, 2019-02-06 at 12:02 +0000, Tariq Toukan wrote:
> 
> On 2/6/2019 2:35 AM, Cong Wang wrote:
> > mlx5_eq_cq_get() is called in IRQ handler, the spinlock inside
> > gets a lot of contentions when we test some heavy workload
> > with 60 RX queues and 80 CPU's, and it is clearly shown in the
> > flame graph.
> > 


Hi Cong,

The patch is ok to me, but i really doubt that you can hit a contention
on latest upstream driver, since we already have spinlock per EQ, which
means spinlock per core,  each EQ (core) msix handler can only access
one spinlock (its own), so I am surprised how you got the contention,
Maybe you are not running on latest upstream driver ?

what is the workload ? 

> > In fact, radix_tree_lookup() is perfectly fine with RCU read lock,
> > we don't have to take a spinlock on this hot path. It is pretty
> > much
> > similar to commit 291c566a2891
> > ("net/mlx4_core: Fix racy CQ (Completion Queue) free"). Slow paths
> > are still serialized with the spinlock, and with synchronize_irq()
> > it should be safe to just move the fast path to RCU read lock.
> > 
> > This patch itself reduces the latency by about 50% with our
> > workload.
> > 
> > Cc: Saeed Mahameed <saeedm@...lanox.com>
> > Cc: Tariq Toukan <tariqt@...lanox.com>
> > Signed-off-by: Cong Wang <xiyou.wangcong@...il.com>
> > ---
> >   drivers/net/ethernet/mellanox/mlx5/core/eq.c | 12 ++++++------
> >   1 file changed, 6 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> > b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> > index ee04aab65a9f..7092457705a2 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> > @@ -114,11 +114,11 @@ static struct mlx5_core_cq
> > *mlx5_eq_cq_get(struct mlx5_eq *eq, u32 cqn)
> >   	struct mlx5_cq_table *table = &eq->cq_table;
> >   	struct mlx5_core_cq *cq = NULL;
> >   
> > -	spin_lock(&table->lock);
> > +	rcu_read_lock();
> >   	cq = radix_tree_lookup(&table->tree, cqn);
> >   	if (likely(cq))
> >   		mlx5_cq_hold(cq);
> > -	spin_unlock(&table->lock);
> > +	rcu_read_unlock();
> 
> Thanks for you patch.
> 
> I think we can improve it further, by taking the if statement out of
> the 
> critical section.
> 

No, mlx5_cq_hold must stay under RCU read, otherwise cq might get freed
before the irq gets a change to increment ref count on it.

another way to do it is not to do any refcounting in the irq handler
and fence cq removal via synchronize_irq(eq->irqn) on mlx5_eq_del_cq.
But let's keep one approach (refcounting), synchronize_irq/rcu can be
heavy sometimes especially on RDMA workloads with many create/destroy
cq in loops.

> Other than that, patch LGTM.
> 
> Regards,
> Tariq
> 
> >   
> >   	return cq;
> >   }
> > @@ -371,9 +371,9 @@ int mlx5_eq_add_cq(struct mlx5_eq *eq, struct
> > mlx5_core_cq *cq)
> >   	struct mlx5_cq_table *table = &eq->cq_table;
> >   	int err;
> >   
> > -	spin_lock_irq(&table->lock);
> > +	spin_lock(&table->lock);
> >   	err = radix_tree_insert(&table->tree, cq->cqn, cq);
> > -	spin_unlock_irq(&table->lock);
> > +	spin_unlock(&table->lock);
> >   
> >   	return err;
> >   }
> > @@ -383,9 +383,9 @@ int mlx5_eq_del_cq(struct mlx5_eq *eq, struct
> > mlx5_core_cq *cq)
> >   	struct mlx5_cq_table *table = &eq->cq_table;
> >   	struct mlx5_core_cq *tmp;
> >   
> > -	spin_lock_irq(&table->lock);
> > +	spin_lock(&table->lock);
> >   	tmp = radix_tree_delete(&table->tree, cq->cqn);
> > -	spin_unlock_irq(&table->lock);
> > +	spin_unlock(&table->lock);
> >   
> >   	if (!tmp) {
> >   		mlx5_core_warn(eq->dev, "cq 0x%x not found in eq 0x%x
> > tree\n", eq->eqn, cq->cqn);
> >