[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <62f9c9ee-d710-7493-1094-e8ffe8aaa1ca@gmail.com>
Date: Sun, 20 Nov 2016 17:14:48 +0200
From: Tariq Toukan <ttoukan.linux@...il.com>
To: Eric Dumazet <eric.dumazet@...il.com>,
David Miller <davem@...emloft.net>
Cc: netdev <netdev@...r.kernel.org>, Tariq Toukan <tariqt@...lanox.com>
Subject: Re: [PATCH net-next] mlx4: avoid unnecessary dirtying of critical
fields
Hi Eric,
Thanks for your patch.
On 18/11/2016 10:15 PM, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@...gle.com>
>
> While stressing a 40Gbit mlx4 NIC with busy polling, I found false
> sharing in mlx4 driver that can be easily avoided.
>
> This patch brings an additional 7 % performance improvement in UDP_RR
> workload.
>
> 1) If we received no frame during one mlx4_en_process_rx_cq()
> invocation, no need to call mlx4_cq_set_ci() and/or dirty ring->cons
>
> 2) Do not refill rx buffers if we have plenty of them.
> This avoids false sharing and allows some bulk/batch optimizations.
> Page allocator and its locks will thank us.
>
> Finally, mlx4_en_poll_rx_cq() should not return 0 if it determined
> cpu handling NIC IRQ should be changed. We should return budget-1
> instead, to not fool net_rx_action() and its netdev_budget.
>
> Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> Cc: Tariq Toukan <tariqt@...lanox.com>
> ---
> drivers/net/ethernet/mellanox/mlx4/en_rx.c | 51 +++++++++++--------
> 1 file changed, 32 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index 22f08f9ef464..2112494ff43b 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -688,18 +688,23 @@ static void validate_loopback(struct mlx4_en_priv *priv, struct sk_buff *skb)
> dev_kfree_skb_any(skb);
> }
>
> -static void mlx4_en_refill_rx_buffers(struct mlx4_en_priv *priv,
> - struct mlx4_en_rx_ring *ring)
> +static bool mlx4_en_refill_rx_buffers(struct mlx4_en_priv *priv,
> + struct mlx4_en_rx_ring *ring)
> {
> - int index = ring->prod & ring->size_mask;
> + u32 missing = ring->actual_size - (ring->prod - ring->cons);
>
> - while ((u32) (ring->prod - ring->cons) < ring->actual_size) {
> - if (mlx4_en_prepare_rx_desc(priv, ring, index,
> + /* Try to batch allocations, but not too much. */
> + if (missing < 8)
> + return false;
> + do {
> + if (mlx4_en_prepare_rx_desc(priv, ring,
> + ring->prod & ring->size_mask,
> GFP_ATOMIC | __GFP_COLD))
> break;
> ring->prod++;
> - index = ring->prod & ring->size_mask;
> - }
> + } while (--missing);
> +
> + return true;
> }
>
> /* When hardware doesn't strip the vlan, we need to calculate the checksum
> @@ -1081,15 +1086,20 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
>
> out:
> rcu_read_unlock();
> - if (doorbell_pending)
> - mlx4_en_xmit_doorbell(priv->tx_ring[TX_XDP][cq->ring]);
> -
> - AVG_PERF_COUNTER(priv->pstats.rx_coal_avg, polled);
> - mlx4_cq_set_ci(&cq->mcq);
> - wmb(); /* ensure HW sees CQ consumer before we post new buffers */
> - ring->cons = cq->mcq.cons_index;
> - mlx4_en_refill_rx_buffers(priv, ring);
> - mlx4_en_update_rx_prod_db(ring);
> +
> + if (polled) {
> + if (doorbell_pending)
> + mlx4_en_xmit_doorbell(priv->tx_ring[TX_XDP][cq->ring]);
> +
> + AVG_PERF_COUNTER(priv->pstats.rx_coal_avg, polled);
Keep this perf stats update out of the if block.
> + mlx4_cq_set_ci(&cq->mcq);
> + wmb(); /* ensure HW sees CQ consumer before we post new buffers */
> + ring->cons = cq->mcq.cons_index;
> + }
> +
> + if (mlx4_en_refill_rx_buffers(priv, ring))
> + mlx4_en_update_rx_prod_db(ring);
> +
> return polled;
> }
>
> @@ -1131,10 +1141,13 @@ int mlx4_en_poll_rx_cq(struct napi_struct *napi, int budget)
> return budget;
>
> /* Current cpu is not according to smp_irq_affinity -
> - * probably affinity changed. need to stop this NAPI
> - * poll, and restart it on the right CPU
> + * probably affinity changed. Need to stop this NAPI
> + * poll, and restart it on the right CPU.
> + * Try to avoid returning a too small value (like 0),
> + * to not fool net_rx_action() and its netdev_budget
> */
> - done = 0;
> + if (done)
> + done--;
> }
> /* Done for now */
> if (napi_complete_done(napi, done))
>
>
It looks good to me, just the one comment aforementioned.
Regards,
Tariq
Powered by blists - more mailing lists