[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1487165355.1311.2.camel@edumazet-glaptop3.roam.corp.google.com>
Date: Wed, 15 Feb 2017 05:29:15 -0800
From: Eric Dumazet <eric.dumazet@...il.com>
To: Saeed Mahameed <saeedm@....mellanox.co.il>
Cc: David Miller <davem@...emloft.net>,
netdev <netdev@...r.kernel.org>,
Tariq Toukan <tariqt@...lanox.com>,
Saeed Mahameed <saeedm@...lanox.com>,
Matan Barak <matanb@...lanox.com>, jackm@...lanox.com
Subject: Re: [PATCH net-next] mlx4: do not fire tasklet unless necessary
On Wed, 2017-02-15 at 13:10 +0200, Saeed Mahameed wrote:
> On Fri, Feb 10, 2017 at 2:27 PM, Eric Dumazet <eric.dumazet@...il.com> wrote:
> > From: Eric Dumazet <edumazet@...gle.com>
> >
> > All rx and rx netdev interrupts are handled by respectively
> > by mlx4_en_rx_irq() and mlx4_en_tx_irq() which simply schedule a NAPI.
> >
> > But mlx4_eq_int() also fires a tasklet to service all items that were
> > queued via mlx4_add_cq_to_tasklet(), but this handler was not called
> > unless user cqe was handled.
> >
> > This is very confusing, as "mpstat -I SCPU ..." show huge number of
> > tasklet invocations.
> >
> > This patch saves this overhead, by carefully firing the tasklet directly
> > from mlx4_add_cq_to_tasklet(), removing four atomic operations per IRQ.
> >
> > Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> > Cc: Tariq Toukan <tariqt@...lanox.com>
> > Cc: Saeed Mahameed <saeedm@...lanox.com>
> > ---
> > drivers/net/ethernet/mellanox/mlx4/cq.c | 6 +++++-
> > drivers/net/ethernet/mellanox/mlx4/eq.c | 9 +--------
> > 2 files changed, 6 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/mellanox/mlx4/cq.c b/drivers/net/ethernet/mellanox/mlx4/cq.c
> > index 6b8635378f1fcb2aae4e8ac390bcd09d552c2256..fa6d2354a0e910ee160863e3cbe21a512d77bf03 100644
> > --- a/drivers/net/ethernet/mellanox/mlx4/cq.c
> > +++ b/drivers/net/ethernet/mellanox/mlx4/cq.c
> > @@ -81,8 +81,9 @@ void mlx4_cq_tasklet_cb(unsigned long data)
> >
> > static void mlx4_add_cq_to_tasklet(struct mlx4_cq *cq)
> > {
> > - unsigned long flags;
> > struct mlx4_eq_tasklet *tasklet_ctx = cq->tasklet_ctx.priv;
> > + unsigned long flags;
> > + bool kick;
> >
> > spin_lock_irqsave(&tasklet_ctx->lock, flags);
> > /* When migrating CQs between EQs will be implemented, please note
> > @@ -92,7 +93,10 @@ static void mlx4_add_cq_to_tasklet(struct mlx4_cq *cq)
> > */
> > if (list_empty_careful(&cq->tasklet_ctx.list)) {
> > atomic_inc(&cq->refcount);
> > + kick = list_empty(&tasklet_ctx->list);
>
> So first one in would fire the tasklet, but wouldn't this cause CQE
> processing loss
> in the same mlx4_eq_int loop if the tasklet was fast enough to
> schedule and while other CQEs are going to add themselves to the
> tasklet_ctx->list ?
mlx4_eq_int() is a hard irq handler.
How a tasklet could run in the middle of it ?
A tasklet is a softirq handler.
softirq must wait that the current hard irq handler is done.
>
> Anyway i tried to find race scenarios that could cause such thing but
> synchronization looks good.
>
> > list_add_tail(&cq->tasklet_ctx.list, &tasklet_ctx->list);
> > + if (kick)
> > + tasklet_schedule(&tasklet_ctx->task);
> > }
> > spin_unlock_irqrestore(&tasklet_ctx->lock, flags);
> > }
> > diff --git a/drivers/net/ethernet/mellanox/mlx4/eq.c b/drivers/net/ethernet/mellanox/mlx4/eq.c
> > index 0509996957d9664b612358dd805359f4bc67b8dc..39232b6a974f4b4b961d3b0b8634f04e6b9d0caa 100644
> > --- a/drivers/net/ethernet/mellanox/mlx4/eq.c
> > +++ b/drivers/net/ethernet/mellanox/mlx4/eq.c
> > @@ -494,7 +494,7 @@ static int mlx4_eq_int(struct mlx4_dev *dev, struct mlx4_eq *eq)
> > {
> > struct mlx4_priv *priv = mlx4_priv(dev);
> > struct mlx4_eqe *eqe;
> > - int cqn = -1;
> > + int cqn;
> > int eqes_found = 0;
> > int set_ci = 0;
> > int port;
> > @@ -840,13 +840,6 @@ static int mlx4_eq_int(struct mlx4_dev *dev, struct mlx4_eq *eq)
> >
> > eq_set_ci(eq, 1);
> >
> > - /* cqn is 24bit wide but is initialized such that its higher bits
> > - * are ones too. Thus, if we got any event, cqn's high bits should be off
> > - * and we need to schedule the tasklet.
> > - */
> > - if (!(cqn & ~0xffffff))
>
> what if we simply change this condition to:
> if (!list_empty_careful(eq->tasklet_ctx.list))
>
> Wouldn't this be sort of equivalent to what you did ? and this way we
> would simply fire the tasklet only when needed and not on every
> handled CQE.
Still this test would be done one million time per second on my hosts.
What is the point exactly ?
Thanks.
Powered by blists - more mailing lists