[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240619003257.6138-1-shannon.nelson@amd.com>
Date: Tue, 18 Jun 2024 17:32:49 -0700
From: Shannon Nelson <shannon.nelson@....com>
To: <netdev@...r.kernel.org>, <davem@...emloft.net>, <kuba@...nel.org>,
<edumazet@...gle.com>, <pabeni@...hat.com>, <David.Laight@...LAB.COM>,
<andrew@...n.ch>
CC: <brett.creeley@....com>, <drivers@...sando.io>, Shannon Nelson
<shannon.nelson@....com>
Subject: [PATCH v2 net-next 0/8] ionic: rework fix for doorbell miss
A latency test in a scaled out setting (many VMs with many queues)
has uncovered an issue with our missed doorbell fix from
commit b69585bfcece ("ionic: missed doorbell workaround")
As a refresher, the Elba ASIC has an issue where once in a blue
moon it might miss/drop a queue doorbell notification from
the driver. This can result in Tx timeouts and potential Rx
buffer misses.
The basic problem with the original solution is that
we're delaying things with a timer for every single queue,
periodically using mod_timer() to reset to reset the alarm, and
mod_timer() becomes a more and more expensive thing as there
are more and more VFs and queues each with their own timer.
A ping-pong latency test tends to exacerbate the effect such
that every napi is doing a mod_timer() in every cycle.
An alternative has been worked out to replace this using
periodic workqueue items outside the napi cycle to request a
napi_schedule driven by a single delayed-workqueue per device
rather than a timer for every queue. Also, now that newer
firmware is actually reporting its ASIC type, we can restrict
this to the appropriate chip.
The testing scenario used 128 VFs in UP state, 16 queues per
VF, and latency tests were done using TCP_RR with adaptive
interrupt coalescing enabled, running on 1 VF. We would see
99th percentile latencies of up to 900us range, with some max
fliers as much as 4ms.
With these fixes the 99th percentile latencies are typically well
under 50us with the occasional max under 500us.
v2:
- 3/8: add commentary for why have a private work queue (Jakub)
- 4/8: no open-code of napi_schedule() (Jakub)
- 4/8: watch for deadlock with cancel_delayed_work_sync() (Jakub)
- 7/8: better ionic_lif field order after reducing rx_copybreak size (David)
- 7/8: include some pahole diff info (Andrew)
- 8/8: use bool not bitflag for doorbell_wa (David)
v1:
https://lore.kernel.org/netdev/20240610230706.34883-1-shannon.nelson@amd.com/
Brett Creeley (3):
ionic: Keep interrupt affinity up to date
ionic: Use an u16 for rx_copybreak
ionic: Only run the doorbell workaround for certain asic_type
Shannon Nelson (5):
ionic: remove missed doorbell per-queue timer
ionic: add private workqueue per-device
ionic: add work item for missed-doorbell check
ionic: add per-queue napi_schedule for doorbell check
ionic: check for queue deadline in doorbell_napi_work
drivers/net/ethernet/pensando/ionic/ionic.h | 7 +
.../ethernet/pensando/ionic/ionic_bus_pci.c | 3 +
.../net/ethernet/pensando/ionic/ionic_dev.c | 129 +++++++++++++++-
.../net/ethernet/pensando/ionic/ionic_dev.h | 8 +-
.../ethernet/pensando/ionic/ionic_ethtool.c | 11 +-
.../net/ethernet/pensando/ionic/ionic_lif.c | 144 ++++++++++++------
.../net/ethernet/pensando/ionic/ionic_lif.h | 12 +-
.../net/ethernet/pensando/ionic/ionic_main.c | 2 +-
.../net/ethernet/pensando/ionic/ionic_txrx.c | 24 ++-
9 files changed, 264 insertions(+), 76 deletions(-)
--
2.17.1
Powered by blists - more mailing lists