netdev - Re: [PATCH net] sfc: use budget for TX completions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230614210430.60f9512f@kernel.org>
Date: Wed, 14 Jun 2023 21:04:30 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: Edward Cree <ecree.xilinx@...il.com>
Cc: Martin Habets <habetsm.xilinx@...il.com>, Íñigo
 Huguet <ihuguet@...hat.com>, davem@...emloft.net, edumazet@...gle.com,
 pabeni@...hat.com, netdev@...r.kernel.org, linux-net-drivers@....com, Fei
 Liu <feliu@...hat.com>
Subject: Re: [PATCH net] sfc: use budget for TX completions

On Thu, 15 Jun 2023 01:36:17 +0100 Edward Cree wrote:
> I think the key question here is can one CPU be using a TXQ to send
>  while another CPU is in a NAPI poll on the same channel and thus
>  trying to clean the EVQ that the TXQ is using.  If so the NAPI poll
>  could last forever; if not then it shouldn't ever have more than 8k
>  (or whatever the TX ring size is set to) events to process.
> And even ignoring affinity of the core TXQs, at the very least XDP
>  TXQs can serve different CPUs to the one on which their EVQ (and
>  hence NAPI poll) lives, which means they can keep filling the EVQ
>  as fast as the NAPI poll empties it, and thus keep ev_process
>  looping forever.
> In principle this can also happen with other kinds of events, e.g.
>  if the MC goes crazy and generates infinite MCDI-event spam then
>  NAPI poll will spin on that CPU forever eating the events.  So
>  maybe this limit needs to be broader than just TX events?  A hard
>  cap on the number of events (regardless of type) that can be
>  consumed in a single efx_ef10_ev_process() invocation, perhaps?

It'd be interesting to analyze the processing time for Tx packets but
with GRO at play the processing time on Rx varies 10x packet to packet.
My intuition is that we should target even amount of time spent on Rx
and Tx, but Tx is cheaper hence higher budget.

The way the wind is blowing I think RT and core folks would prefer 
us to switch to checking the time rather than budgets. Time check
amortized across multiple events, so instead of talking about cap on
events perhaps we should consider how often we should be checking if 
the NAPI time slice has elapsed?

That said I feel like if we try to reshuffle NAPI polling logic
without a real workload to test with we may do more damage than good.