[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <DM4PR11MB6502F4622507B81F56DFFC65D45EA@DM4PR11MB6502.namprd11.prod.outlook.com>
Date: Thu, 24 Jul 2025 00:03:33 +0000
From: "Hay, Joshua A" <joshua.a.hay@...el.com>
To: Simon Horman <horms@...nel.org>
CC: "intel-wired-lan@...ts.osuosl.org" <intel-wired-lan@...ts.osuosl.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: RE: [Intel-wired-lan] [PATCH net v2 0/6] idpf: replace Tx flow
scheduling buffer ring with buffer pool
> On Thu, Jul 17, 2025 at 05:21:44PM -0700, Joshua Hay wrote:
> > This series fixes a stability issue in the flow scheduling Tx send/clean
> > path that results in a Tx timeout.
> >
> > The existing guardrails in the Tx path were not sufficient to prevent
> > the driver from reusing completion tags that were still in flight (held
> > by the HW). This collision would cause the driver to erroneously clean
> > the wrong packet thus leaving the descriptor ring in a bad state.
> >
> > The main point of this refactor is to replace the flow scheduling buffer
> > ring with a large pool/array of buffers. The completion tag then simply
> > is the index into this array. The driver tracks the free tags and pulls
> > the next free one from a refillq. The cleaning routines simply use the
> > completion tag from the completion descriptor to index into the array to
> > quickly find the buffers to clean.
> >
> > All of the code to support the refactor is added first to ensure traffic
> > still passes with each patch. The final patch then removes all of the
> > obsolete stashing code.
> >
> > ---
> > v2:
> > - Add a new patch "idpf: simplify and fix splitq Tx packet rollback
> > error path" that fixes a bug in the error path. It also sets up
> > changes in patch 4 that are necessary to prevent a crash when a packet
> > rollback occurs using the buffer pool.
> >
> > v1:
> > https://lore.kernel.org/intel-wired-lan/c6444d15-bc20-41a8-9230-
> 9bb266cb2ac6@...gen.mpg.de/T/#maf9f464c598951ee860e5dd24ef8a45
> 1a488c5a0
> >
> > Joshua Hay (6):
> > idpf: add support for Tx refillqs in flow scheduling mode
> > idpf: improve when to set RE bit logic
> > idpf: simplify and fix splitq Tx packet rollback error path
> > idpf: replace flow scheduling buffer ring with buffer pool
> > idpf: stop Tx if there are insufficient buffer resources
> > idpf: remove obsolete stashing code
> >
> > .../ethernet/intel/idpf/idpf_singleq_txrx.c | 61 +-
> > drivers/net/ethernet/intel/idpf/idpf_txrx.c | 723 +++++++-----------
> > drivers/net/ethernet/intel/idpf/idpf_txrx.h | 87 +--
> > 3 files changed, 356 insertions(+), 515 deletions(-)
>
> Hi Joshua, all,
>
> Perhaps it is not followed much anymore, but at least according to [1]
> patches for stable should not be more than 100 lines, with context.
>
> This patch-set is an order of magnitude larger.
> Can something be done to create a more minimal fix?
>
> [1] https://docs.kernel.org/process/stable-kernel-rules.html#stable-kernel-
> rules
Hi Simon,
It will be quite difficult to make this series smaller without any negative side effects. Here's a summary of why certain patches are necessary or where deferrals are possible (and the side effects):
"idpf: add support for Tx refillqs in flow scheduling mode" is required to keep the Tx path lockless. Otherwise, we would have to use locking in hot path to pass the tags freed in the cleaning thread back to whatever data struct the sending thread pulls the tags from.
Without "idpf: improve when to set RE bit logic", the driver will be much more susceptible to Tx timeouts when sending large packets.
"idpf: simplify and fix splitq Tx packet rollback error path" is a must to setup for the new buffer chaining. The previous implementation rolled back based on ring indexing, which will crash the system if a dma_mapping_error occurs or we run out of tags (more on that below). The buffer tags (indexes) pulled for a packet are not guaranteed to be contiguous, especially as out-of-order completions are processed.
"idpf: stop Tx if there are insufficient buffer resources" could possibly be deferred, but that makes the rollback change above even more critical as we lose an extra layer of protection from running out of tags. It's also one of the smaller patches and won't make a significant difference in terms of size.
"idpf: remove obsolete stashing code" could also potentially be deferred but is 95% removing obsolete code, which leaves a lot of dead code.
Thanks!
Josh
Powered by blists - more mailing lists