lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <68483433b45e2_3cd66f29440@willemb.c.googlers.com.notmuch>
Date: Tue, 10 Jun 2025 09:33:39 -0400
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Jakub Kicinski <kuba@...nel.org>, 
 michael.chan@...adcom.com, 
 pavan.chebbi@...adcom.com
Cc: willemdebruijn.kernel@...il.com, 
 netdev@...r.kernel.org, 
 davem@...emloft.net, 
 edumazet@...gle.com, 
 pabeni@...hat.com, 
 andrew+netdev@...n.ch, 
 horms@...nel.org, 
 Jakub Kicinski <kuba@...nel.org>, 
 andrew@...n.ch, 
 ecree.xilinx@...il.com
Subject: Re: [RFC net-next 2/6] net: ethtool: support including Flow Label in
 the flow hash for RSS

Jakub Kicinski wrote:
> Some modern NICs support including the IPv6 Flow Label in
> the flow hash for RSS queue selection. This is outside
> the old "Microsoft spec", but was included in the OCP NIC spec:
> 
>   [ ] RSS include ow label in the hash (configurable)
> 
> https://www.opencompute.org/documents/ocp-server-nic-core-features-specification-ocp-spec-format-1-1-pdf

Or perhaps https://www.opencompute.org/w/index.php?title=Core_Offloads#Receive_Side_Scaling

One thing to make very clear is that in this design the flow label is
an extra field to include. It does not replace the L4 fields.

This is perhaps mistaken. The IPv6 flow label definition is

"Packet classifiers can
 use the triplet of Flow Label, Source Address, and Destination
 Address fields to identify the flow to which a particular packet
 belongs."

https://datatracker.ietf.org/doc/html/rfc6437#section-2

So explicitly also hashing in the L4 fields should not be needed.

Generally the flow label includes the L4 ports in its initial value.
Though PLB, through sk_rethink_txhash, will remove that.

Similarly an IPv6 tunneled packet should no longer need hashing of
it inner layer(s) if the outer flow label is sufficiently computed by
the source to identify a single flow. AFAIK that was the entire point
of this field.

That said, it is always safe to include the L4 fields as well. And in
the end what matters is configuring what the hardware already
supports.
 
> RSS Flow Label hashing allows TCP Protective Load Balancing (PLB)
> to recover from receiver congestion / overload.
> Rx CPU/queue hotspots are relatively common for data ingest
> workloads, and so far we had to try to detect the condition
> at the RPC layer and reopen the connection. PLB lets us change
> the Flow Label and therefore Rx CPU on RTO, with minimal packet
> reordering. PLB reaction times are much faster, and can happen
> at any point in the connection, not just at RPC boundaries.
> 
> Due to the nature of host processing (relatively long queues,
> other kernel subsystems masking IRQs for 100s of msecs)
> the risk of reordering within the host is higher than in
> the network. But for applications which need it - it is far
> preferable to potentially persistent overload of subset of
> queues.
> 
> It is expected that the hash communicated to the host
> may change if the Flow Label changes. This may be surprising
> to some host software, but I don't expect the devices
> can compute two Toeplitz hashes, one with the Flow Label
> for queue selection and one without for the rx hash
> communicated to the host. Besides, changing the hash
> may potentially help to change the path thru host queues.
> User can disable NETIF_F_RXHASH if they require a stable
> flow hash.
> 
> The name RXH_IP6_FL was chosen based on what we call
> Flow Label variables in IPv6 processing (fl). I prefer
> fl_lbl but that appears to be an fbnic-only spelling.
> We could spell out RXH_IP6_FLOW_LABEL but existing
> RXH_ defines are a lot more terse.
> 
> Signed-off-by: Jakub Kicinski <kuba@...nel.org>
> ---
> CC: andrew@...n.ch
> CC: ecree.xilinx@...il.com
> ---
>  include/uapi/linux/ethtool.h |  1 +
>  net/ethtool/ioctl.c          | 25 +++++++++++++++++++++++++
>  2 files changed, 26 insertions(+)
> 
> diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
> index 707c1844010c..fed36644eb1d 100644
> --- a/include/uapi/linux/ethtool.h
> +++ b/include/uapi/linux/ethtool.h
> @@ -2380,6 +2380,7 @@ enum {
>  #define	RXH_L4_B_0_1	(1 << 6) /* src port in case of TCP/UDP/SCTP */
>  #define	RXH_L4_B_2_3	(1 << 7) /* dst port in case of TCP/UDP/SCTP */
>  #define	RXH_GTP_TEID	(1 << 8) /* teid in case of GTP */
> +#define	RXH_IP6_FL	(1 << 9) /* IPv6 flow label */
>  #define	RXH_DISCARD	(1 << 31)
>  
>  #define	RX_CLS_FLOW_DISC	0xffffffffffffffffULL
> diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c
> index e8ca70554b2e..181ec2347547 100644
> --- a/net/ethtool/ioctl.c
> +++ b/net/ethtool/ioctl.c
> @@ -1013,6 +1013,28 @@ static bool flow_type_hashable(u32 flow_type)
>  	return false;
>  }
>  
> +static bool flow_type_v6(u32 flow_type)
> +{
> +	switch (flow_type) {
> +	case TCP_V6_FLOW:
> +	case UDP_V6_FLOW:
> +	case SCTP_V6_FLOW:
> +	case AH_ESP_V6_FLOW:
> +	case AH_V6_FLOW:
> +	case ESP_V6_FLOW:
> +	case IPV6_FLOW:
> +	case GTPU_V6_FLOW:
> +	case GTPC_V6_FLOW:
> +	case GTPC_TEID_V6_FLOW:
> +	case GTPU_EH_V6_FLOW:
> +	case GTPU_UL_V6_FLOW:
> +	case GTPU_DL_V6_FLOW:
> +		return true;
> +	}
> +
> +	return false;
> +}
> +
>  /* When adding a new type, update the assert and, if it's hashable, add it to
>   * the flow_type_hashable switch case.
>   */
> @@ -1066,6 +1088,9 @@ ethtool_srxfh_check(struct net_device *dev, const struct ethtool_rxnfc *info)
>  	const struct ethtool_ops *ops = dev->ethtool_ops;
>  	int rc;
>  
> +	if (info->data & RXH_IP6_FL && !flow_type_v6(info->flow_type))
> +		return -EINVAL;
> +
>  	if (ops->get_rxfh) {
>  		struct ethtool_rxfh_param rxfh = {};
>  
> -- 
> 2.49.0
> 



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ