[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <IA1PR11MB6241A3B6BBE3E84287C6DA678B74A@IA1PR11MB6241.namprd11.prod.outlook.com>
Date: Thu, 12 Jun 2025 04:58:55 +0000
From: "Rinitha, SX" <sx.rinitha@...el.com>
To: Krishna Kumar <krikku@...il.com>, "netdev@...r.kernel.org"
<netdev@...r.kernel.org>
CC: "davem@...emloft.net" <davem@...emloft.net>, "horms@...nel.org"
<horms@...nel.org>, "Nguyen, Anthony L" <anthony.l.nguyen@...el.com>,
"Kitszel, Przemyslaw" <przemyslaw.kitszel@...el.com>, "edumazet@...gle.com"
<edumazet@...gle.com>, "intel-wired-lan@...ts.osuosl.org"
<intel-wired-lan@...ts.osuosl.org>, "andrew+netdev@...n.ch"
<andrew+netdev@...n.ch>, "kuba@...nel.org" <kuba@...nel.org>,
"pabeni@...hat.com" <pabeni@...hat.com>, "Samudrala, Sridhar"
<sridhar.samudrala@...el.com>, "Zaki, Ahmed" <ahmed.zaki@...el.com>, "Kumar,
Krishna" <krishna.ku@...pkart.com>
Subject: RE: [Intel-wired-lan] [PATCH v2 net] net: ice: Perform accurate aRFS
flow match
> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@...osl.org> On Behalf Of Krishna Kumar
> Sent: 20 May 2025 22:37
> To: netdev@...r.kernel.org
> Cc: davem@...emloft.net; horms@...nel.org; Nguyen, Anthony L <anthony.l.nguyen@...el.com>; Kitszel, Przemyslaw <przemyslaw.kitszel@...el.com>; edumazet@...gle.com; intel-wired-lan@...ts.osuosl.org; andrew+netdev@...n.ch; kuba@...nel.org; pabeni@...hat.com; Samudrala, Sridhar <sridhar.samudrala@...el.com>; Zaki, Ahmed <ahmed.zaki@...el.com>; Kumar, Krishna <krishna.ku@...pkart.com>
> Subject: [Intel-wired-lan] [PATCH v2 net] net: ice: Perform accurate aRFS flow match
>
> This patch fixes an issue seen in a large-scale deployment under heavy incoming pkts where the aRFS flow wrongly matches a flow and reprograms the NIC with wrong settings. That mis-steering causes RX-path latency spikes and noisy neighbor effects when many connections collide on the same hash (some of our production servers have 20-30K connections).
>
> set_rps_cpu() calls ndo_rx_flow_steer() with flow_id that is calculated by hashing the skb sized by the per rx-queue table size. This results in multiple connections (even across different rx-queues) getting the same hash value. > The driver steer function modifies the wrong flow to use this rx-queue, e.g.: Flow#1 is first added:
> Flow#1: <ip1, port1, ip2, port2>, Hash 'h', q#10
>
> Later when a new flow needs to be added:
> Flow#2: <ip3, port3, ip4, port4>, Hash 'h', q#20
>
> The driver finds the hash 'h' from Flow#1 and updates it to use q#20. This results in both flows getting un-optimized - packets for Flow#1 goes to q#20, and then reprogrammed back to q#10 later and so on; and Flow #2 programming is never done as Flow#1 is matched first for all misses. Many flows may wrongly share the same hash and reprogram rules of the original flow each with their own q#.
>
> Tested on two 144-core servers with 16K netperf sessions for 180s. Netperf clients are pinned to cores 0-71 sequentially (so that wrong packets on q#s
72-143 can be measured). IRQs are set 1:1 for queues -> CPUs, enable XPS, enable aRFS (global value is 144 * rps_flow_cnt).
>
> Test notes about results from ice_rx_flow_steer():
> ---------------------------------------------------
> 1. "Skip:" counter increments here:
> if (fltr_info->q_index == rxq_idx ||
> arfs_entry->fltr_state != ICE_ARFS_ACTIVE)
> goto out;
> 2. "Add:" counter increments here:
> ret = arfs_entry->fltr_info.fltr_id;
> INIT_HLIST_NODE(&arfs_entry->list_entry);
> 3. "Update:" counter increments here:
> /* update the queue to forward to on an already existing flow */
>
> Runtime comparison: original code vs with the patch for different rps_flow_cnt values.
>
> +-------------------------------+--------------+--------------+
> | rps_flow_cnt | 512 | 2048 |
> +-------------------------------+--------------+--------------+
> | Ratio of Pkts on Good:Bad q's | 214 vs 822K | 1.1M vs 980K |
> | Avoid wrong aRFS programming | 0 vs 310K | 0 vs 30K |
> | CPU User | 216 vs 183 | 216 vs 206 |
> | CPU System | 1441 vs 1171 | 1447 vs 1320 |
> | CPU Softirq | 1245 vs 920 | 1238 vs 961 |
> | CPU Total | 29 vs 22.7 | 29 vs 24.9 |
> | aRFS Update | 533K vs 59 | 521K vs 32 |
> | aRFS Skip | 82M vs 77M | 7.2M vs 4.5M |
> +-------------------------------+--------------+--------------+
>
> A separate TCP_STREAM and TCP_RR with 1,4,8,16,64,128,256,512 connections showed no performance degradation.
>
> Some points on the patch/aRFS behavior:
> 1. Enabling full tuple matching ensures flows are always correctly matched,
> even with smaller hash sizes.
> 2. 5-6% drop in CPU utilization as the packets arrive at the correct CPUs
> and fewer calls to driver for programming on misses.
> 3. Larger hash tables reduces mis-steering due to more unique flow hashes,
> but still has clashes. However, with larger per-device rps_flow_cnt, old
> flows take more time to expire and new aRFS flows cannot be added if h/w
> limits are reached (rps_may_expire_flow() succeeds when 10*rps_flow_cnt
> pkts have been processed by this cpu that are not part of the flow).
>
> Changes since v1:
> - Added "Fixes:" tag and documented return values.
> - Added @ for function parameters.
> - Updated subject line to denote target tree (net)
>
> Fixes: 28bf26724fdb0 ("ice: Implement aRFS")
> Signed-off-by: Krishna Kumar <krikku@...il.com>
> ---
> drivers/net/ethernet/intel/ice/ice_arfs.c | 49 +++++++++++++++++++++++
> 1 file changed, 49 insertions(+)
>
Tested-by: Rinitha S <sx.rinitha@...el.com> (A Contingent worker at Intel)
Powered by blists - more mailing lists