netdev - RE: [Intel-wired-lan] [PATCH v2 net] net: ice: Perform accurate aRFS flow match

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <IA1PR11MB6241A3B6BBE3E84287C6DA678B74A@IA1PR11MB6241.namprd11.prod.outlook.com>
Date: Thu, 12 Jun 2025 04:58:55 +0000
From: "Rinitha, SX" <sx.rinitha@...el.com>
To: Krishna Kumar <krikku@...il.com>, "netdev@...r.kernel.org"
	<netdev@...r.kernel.org>
CC: "davem@...emloft.net" <davem@...emloft.net>, "horms@...nel.org"
	<horms@...nel.org>, "Nguyen, Anthony L" <anthony.l.nguyen@...el.com>,
	"Kitszel, Przemyslaw" <przemyslaw.kitszel@...el.com>, "edumazet@...gle.com"
	<edumazet@...gle.com>, "intel-wired-lan@...ts.osuosl.org"
	<intel-wired-lan@...ts.osuosl.org>, "andrew+netdev@...n.ch"
	<andrew+netdev@...n.ch>, "kuba@...nel.org" <kuba@...nel.org>,
	"pabeni@...hat.com" <pabeni@...hat.com>, "Samudrala, Sridhar"
	<sridhar.samudrala@...el.com>, "Zaki, Ahmed" <ahmed.zaki@...el.com>, "Kumar,
 Krishna" <krishna.ku@...pkart.com>
Subject: RE: [Intel-wired-lan] [PATCH v2 net] net: ice: Perform accurate aRFS
 flow match

> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@...osl.org> On Behalf Of Krishna Kumar
> Sent: 20 May 2025 22:37
> To: netdev@...r.kernel.org
> Cc: davem@...emloft.net; horms@...nel.org; Nguyen, Anthony L <anthony.l.nguyen@...el.com>; Kitszel, Przemyslaw <przemyslaw.kitszel@...el.com>; edumazet@...gle.com; intel-wired-lan@...ts.osuosl.org; andrew+netdev@...n.ch; kuba@...nel.org; pabeni@...hat.com; Samudrala, Sridhar <sridhar.samudrala@...el.com>; Zaki, Ahmed <ahmed.zaki@...el.com>; Kumar, Krishna <krishna.ku@...pkart.com>
> Subject: [Intel-wired-lan] [PATCH v2 net] net: ice: Perform accurate aRFS flow match
>
> This patch fixes an issue seen in a large-scale deployment under heavy incoming pkts where the aRFS flow wrongly matches a flow and reprograms the NIC with wrong settings. That mis-steering causes RX-path latency spikes and noisy neighbor effects when many connections collide on the same hash (some of our production servers have 20-30K connections).
>
> set_rps_cpu() calls ndo_rx_flow_steer() with flow_id that is calculated by hashing the skb sized by the per rx-queue table size. This results in multiple connections (even across different rx-queues) getting the same hash value. > The driver steer function modifies the wrong flow to use this rx-queue, e.g.: Flow#1 is first added:
>    Flow#1:  <ip1, port1, ip2, port2>, Hash 'h', q#10
>
> Later when a new flow needs to be added:
>	    Flow#2:  <ip3, port3, ip4, port4>, Hash 'h', q#20
>
> The driver finds the hash 'h' from Flow#1 and updates it to use q#20. This results in both flows getting un-optimized - packets for Flow#1 goes to q#20, and then reprogrammed back to q#10 later and so on; and Flow #2 programming is never done as Flow#1 is matched first for all misses. Many flows may wrongly share the same hash and reprogram rules of the original flow each with their own q#.
>
> Tested on two 144-core servers with 16K netperf sessions for 180s. Netperf clients are pinned to cores 0-71 sequentially (so that wrong packets on q#s
72-143 can be measured). IRQs are set 1:1 for queues -> CPUs, enable XPS, enable aRFS (global value is 144 * rps_flow_cnt).
>
> Test notes about results from ice_rx_flow_steer():
> ---------------------------------------------------
> 1. "Skip:" counter increments here:
>    if (fltr_info->q_index == rxq_idx ||
>	arfs_entry->fltr_state != ICE_ARFS_ACTIVE)
>	    goto out;
> 2. "Add:" counter increments here:
>    ret = arfs_entry->fltr_info.fltr_id;
>    INIT_HLIST_NODE(&arfs_entry->list_entry);
> 3. "Update:" counter increments here:
>    /* update the queue to forward to on an already existing flow */
>
> Runtime comparison: original code vs with the patch for different rps_flow_cnt values.
>
> +-------------------------------+--------------+--------------+
> | rps_flow_cnt                  |      512     |    2048      |
> +-------------------------------+--------------+--------------+
> | Ratio of Pkts on Good:Bad q's | 214 vs 822K  | 1.1M vs 980K |
> | Avoid wrong aRFS programming  | 0 vs 310K    | 0 vs 30K     |
> | CPU User                      | 216 vs 183   | 216 vs 206   |
> | CPU System                    | 1441 vs 1171 | 1447 vs 1320 |
> | CPU Softirq                   | 1245 vs 920  | 1238 vs 961  |
> | CPU Total                     | 29 vs 22.7   | 29 vs 24.9   |
> | aRFS Update                   | 533K vs 59   | 521K vs 32   |
> | aRFS Skip                     | 82M vs 77M   | 7.2M vs 4.5M |
> +-------------------------------+--------------+--------------+
>
> A separate TCP_STREAM and TCP_RR with 1,4,8,16,64,128,256,512 connections showed no performance degradation.
>
> Some points on the patch/aRFS behavior:
> 1. Enabling full tuple matching ensures flows are always correctly matched,
>   even with smaller hash sizes.
> 2. 5-6% drop in CPU utilization as the packets arrive at the correct CPUs
>   and fewer calls to driver for programming on misses.
> 3. Larger hash tables reduces mis-steering due to more unique flow hashes,
>   but still has clashes. However, with larger per-device rps_flow_cnt, old
>   flows take more time to expire and new aRFS flows cannot be added if h/w
>  limits are reached (rps_may_expire_flow() succeeds when 10*rps_flow_cnt
>   pkts have been processed by this cpu that are not part of the flow).
>
> Changes since v1:
>  - Added "Fixes:" tag and documented return values.
>  - Added @ for function parameters.
>  - Updated subject line to denote target tree (net)
>
> Fixes: 28bf26724fdb0 ("ice: Implement aRFS")
> Signed-off-by: Krishna Kumar <krikku@...il.com>
> ---
> drivers/net/ethernet/intel/ice/ice_arfs.c | 49 +++++++++++++++++++++++
> 1 file changed, 49 insertions(+)
>

Tested-by: Rinitha S <sx.rinitha@...el.com> (A Contingent worker at Intel)