[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <YzWpT/NfDzhnsiTI@linutronix.de>
Date: Thu, 29 Sep 2022 16:18:55 +0200
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: "Jason A. Donenfeld" <Jason@...c4.com>
Cc: netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
Sherry Yang <sherry.yang@...cle.com>,
Paul Webb <paul.x.webb@...cle.com>,
Phillip Goerl <phillip.goerl@...cle.com>,
Jack Vogel <jack.vogel@...cle.com>,
Nicky Veitch <nicky.veitch@...cle.com>,
Colm Harrington <colm.harrington@...cle.com>,
Ramanan Govindarajan <ramanan.govindarajan@...cle.com>,
Dominik Brodowski <linux@...inikbrodowski.net>,
Tejun Heo <tj@...nel.org>,
Sultan Alsawaf <sultan@...neltoast.com>, stable@...r.kernel.org
Subject: Re: [PATCH v3] random: use expired per-cpu timer rather than wq for
mixing fast pool
On 2022-09-28 18:15:46 [+0200], Jason A. Donenfeld wrote:
> Hi Sebastian,
Hi Jason,
> On Wed, Sep 28, 2022 at 02:06:45PM +0200, Sebastian Andrzej Siewior wrote:
> > On 2022-09-27 12:42:33 [+0200], Jason A. Donenfeld wrote:
> > …
> > > This is an ordinary pattern done all over the kernel. However, Sherry
> > > noticed a 10% performance regression in qperf TCP over a 40gbps
> > > InfiniBand card. Quoting her message:
> > >
> > > > MT27500 Family [ConnectX-3] cards:
> > > > Infiniband device 'mlx4_0' port 1 status:
> > …
> >
> > While looking at the mlx4 driver, it looks like they don't use any NAPI
> > handling in their interrupt handler which _might_ be the case that they
> > handle more than 1k interrupts a second. I'm still curious to get that
> > ACKed from Sherry's side.
>
> Are you sure about that? So far as I can tell drivers/net/ethernet/
> mellanox/mlx4 has plenty of napi_schedule/napi_enable and such. Or are
> you looking at the infiniband driver instead? I don't really know how
> these interact.
I've been looking at mlx4_msi_x_interrupt() and it appears that it
iterates over a ring buffer. I guess that mlx4_cq_completion() will
invoke mlx4_en_rx_irq() which schedules NAPI.
> But yea, if we've got a driver not using NAPI at 40gbps that's obviously
> going to be a problem.
So I'm wondering if we get 1 worker a second which kills the performance
or if we get more than 1k interrupts in less than second resulting in
more wakeups within a second..
> > Jason, from random's point of view: deferring until 1k interrupts + 1sec
> > delay is not desired due to low entropy, right?
>
> Definitely || is preferable to &&.
>
> >
> > > Rather than incur the scheduling latency from queue_work_on, we can
> > > instead switch to running on the next timer tick, on the same core. This
> > > also batches things a bit more -- once per jiffy -- which is okay now
> > > that mix_interrupt_randomness() can credit multiple bits at once.
> >
> > Hmmm. Do you see higher contention on input_pool.lock? Just asking
> > because if more than once CPUs invokes this timer callback aligned, then
> > they block on the same lock.
>
> I've been doing various experiments, sending mini patches to Oracle and
> having them test this in their rig. So far, it looks like the cost of
> the body of the worker itself doesn't matter much, but rather the cost
> of the enqueueing function is key. Still investigating though.
>
> It's a bit frustrating, as all I have to work with are results from the
> tests, and no perf analysis. It'd be great if an engineer at Oracle was
> capable of tackling this interactively, but at the moment it's just me
> sending them patches. So we'll see. Getting closer though, albeit very
> slowly.
Oh boy. Okay.
> Jason
Sebastian
Powered by blists - more mailing lists