[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <dca173e6-f23d-4f54-8ace-74329439c7da@gmail.com>
Date: Tue, 16 Sep 2025 15:46:52 +0200
From: Richard Gobert <richardbgobert@...il.com>
To: Willem de Bruijn <willemdebruijn.kernel@...il.com>,
netdev@...r.kernel.org, pabeni@...hat.com, ecree.xilinx@...il.com
Cc: davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
horms@...nel.org, corbet@....net, saeedm@...dia.com, tariqt@...dia.com,
mbloch@...dia.com, leon@...nel.org, dsahern@...nel.org,
ncardwell@...gle.com, kuniyu@...gle.com, shuah@...nel.org, sdf@...ichev.me,
aleksander.lobakin@...el.com, florian.fainelli@...adcom.com,
alexander.duyck@...il.com, linux-kernel@...r.kernel.org,
linux-net-drivers@....com
Subject: Re: [PATCH net-next v5 3/5] net: gso: restore ids of outer ip headers
correctly
Willem de Bruijn wrote:
> Richard Gobert wrote:
>> Currently, NETIF_F_TSO_MANGLEID indicates that the inner-most ID can
>> be mangled. Outer IDs can always be mangled.
>>
>> Make GSO preserve outer IDs by default, with NETIF_F_TSO_MANGLEID allowing
>> both inner and outer IDs to be mangled.
>>
>> This commit also modifies a few drivers that use SKB_GSO_FIXEDID directly.
>>
>> Signed-off-by: Richard Gobert <richardbgobert@...il.com>
>> Reviewed-by: Edward Cree <ecree.xilinx@...il.com> # for sfc
>> ---
>> .../networking/segmentation-offloads.rst | 22 ++++++++++++-------
>> .../net/ethernet/mellanox/mlx5/core/en_rx.c | 8 +++++--
>> drivers/net/ethernet/sfc/ef100_tx.c | 17 ++++++++++----
>> include/linux/netdevice.h | 9 ++++++--
>> include/linux/skbuff.h | 9 +++++++-
>> net/core/dev.c | 5 ++++-
>> net/ipv4/af_inet.c | 13 +++++------
>> net/ipv4/tcp_offload.c | 5 +----
>> 8 files changed, 59 insertions(+), 29 deletions(-)
>>
>> diff --git a/Documentation/networking/segmentation-offloads.rst b/Documentation/networking/segmentation-offloads.rst
>> index 085e8fab03fd..72f69b22b28c 100644
>> --- a/Documentation/networking/segmentation-offloads.rst
>> +++ b/Documentation/networking/segmentation-offloads.rst
>> @@ -43,10 +43,19 @@ also point to the TCP header of the packet.
>> For IPv4 segmentation we support one of two types in terms of the IP ID.
>> The default behavior is to increment the IP ID with every segment. If the
>> GSO type SKB_GSO_TCP_FIXEDID is specified then we will not increment the IP
>> -ID and all segments will use the same IP ID. If a device has
>> -NETIF_F_TSO_MANGLEID set then the IP ID can be ignored when performing TSO
>> -and we will either increment the IP ID for all frames, or leave it at a
>> -static value based on driver preference.
>> +ID and all segments will use the same IP ID.
>> +
>> +For encapsulated packets, SKB_GSO_TCP_FIXEDID refers only to the outer header.
>> +SKB_GSO_TCP_FIXEDID_INNER can be used to specify the same for the inner header.
>> +Any combination of these two GSO types is allowed.
>> +
>> +If a device has NETIF_F_TSO_MANGLEID set then the IP ID can be ignored when
>> +performing TSO and we will either increment the IP ID for all frames, or leave
>> +it at a static value based on driver preference. For encapsulated packets,
>> +NETIF_F_TSO_MANGLEID is relevant for both outer and inner headers, unless the
>> +DF bit is not set on the outer header, in which case the device driver must
>> +guarantee that the IP ID field is incremented in the outer header with every
>> +segment.
>
> Is this introducing a new device requirement for advertising
> NETIF_F_TSO_MANGLEID that existing devices may not meet?
>
No. As I discussed previously with Paolo, existing devices already increment outer
IDs. It is even a requirement for GSO partial, as already stated in the documentation.
This preserves the current behavior. It just makes it explicit. Actually, we are
now even explicitly allowing the mangling of outer IDs if the DF-bit is set.
>>
>> UDP Fragmentation Offload
>> @@ -124,10 +133,7 @@ Generic Receive Offload
>> Generic receive offload is the complement to GSO. Ideally any frame
>> assembled by GRO should be segmented to create an identical sequence of
>> frames using GSO, and any sequence of frames segmented by GSO should be
>> -able to be reassembled back to the original by GRO. The only exception to
>> -this is IPv4 ID in the case that the DF bit is set for a given IP header.
>> -If the value of the IPv4 ID is not sequentially incrementing it will be
>> -altered so that it is when a frame assembled via GRO is segmented via GSO.
>> +able to be reassembled back to the original by GRO.
>>
>>
>> Partial Generic Segmentation Offload
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
>> index b8c609d91d11..505c4ce7cef8 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
>> @@ -1289,8 +1289,12 @@ static void mlx5e_shampo_update_ipv4_tcp_hdr(struct mlx5e_rq *rq, struct iphdr *
>> tcp->check = ~tcp_v4_check(skb->len - tcp_off, ipv4->saddr,
>> ipv4->daddr, 0);
>> skb_shinfo(skb)->gso_type |= SKB_GSO_TCPV4;
>> - if (ntohs(ipv4->id) == rq->hw_gro_data->second_ip_id)
>> - skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_FIXEDID;
>> + if (ntohs(ipv4->id) == rq->hw_gro_data->second_ip_id) {
>> + bool encap = rq->hw_gro_data->fk.control.flags & FLOW_DIS_ENCAPSULATION;
>> +
>> + skb_shinfo(skb)->gso_type |= encap ?
>> + SKB_GSO_TCP_FIXEDID_INNER : SKB_GSO_TCP_FIXEDID;
>
> I think more common style is
>
> encap ? SKB_GSO_TCP_FIXEDID_INNER :
> SKB_GSO_TCP_FIXEDID;
>
> (as used in ef100_make_tso_desc below)
>
>> + }
>>
>> skb->csum_start = (unsigned char *)tcp - skb->head;
>> skb->csum_offset = offsetof(struct tcphdr, check);
>> diff --git a/drivers/net/ethernet/sfc/ef100_tx.c b/drivers/net/ethernet/sfc/ef100_tx.c
>> index e6b6be549581..03005757c060 100644
>> --- a/drivers/net/ethernet/sfc/ef100_tx.c
>> +++ b/drivers/net/ethernet/sfc/ef100_tx.c
>> @@ -189,6 +189,7 @@ static void ef100_make_tso_desc(struct efx_nic *efx,
>> {
>> bool gso_partial = skb_shinfo(skb)->gso_type & SKB_GSO_PARTIAL;
>> unsigned int len, ip_offset, tcp_offset, payload_segs;
>> + u32 mangleid_outer = ESE_GZ_TX_DESC_IP4_ID_INC_MOD16;
>> u32 mangleid = ESE_GZ_TX_DESC_IP4_ID_INC_MOD16;
>> unsigned int outer_ip_offset, outer_l4_offset;
>> u16 vlan_tci = skb_vlan_tag_get(skb);
>> @@ -200,8 +201,17 @@ static void ef100_make_tso_desc(struct efx_nic *efx,
>> bool outer_csum;
>> u32 paylen;
>>
>> - if (skb_shinfo(skb)->gso_type & SKB_GSO_TCP_FIXEDID)
>> - mangleid = ESE_GZ_TX_DESC_IP4_ID_NO_OP;
>> + if (encap) {
>> + if (skb_shinfo(skb)->gso_type & SKB_GSO_TCP_FIXEDID_INNER)
>> + mangleid = ESE_GZ_TX_DESC_IP4_ID_NO_OP;
>> + if (skb_shinfo(skb)->gso_type & SKB_GSO_TCP_FIXEDID)
>> + mangleid_outer = ESE_GZ_TX_DESC_IP4_ID_NO_OP;
>> + } else {
>> + if (skb_shinfo(skb)->gso_type & SKB_GSO_TCP_FIXEDID)
>> + mangleid = ESE_GZ_TX_DESC_IP4_ID_NO_OP;
>> + mangleid_outer = ESE_GZ_TX_DESC_IP4_ID_NO_OP;
>> + }
>> +
>> if (efx->net_dev->features & NETIF_F_HW_VLAN_CTAG_TX)
>> vlan_enable = skb_vlan_tag_present(skb);
>>
>> @@ -245,8 +255,7 @@ static void ef100_make_tso_desc(struct efx_nic *efx,
>> ESF_GZ_TX_TSO_OUTER_L4_OFF_W, outer_l4_offset >> 1,
>> ESF_GZ_TX_TSO_ED_OUTER_UDP_LEN, udp_encap && !gso_partial,
>> ESF_GZ_TX_TSO_ED_OUTER_IP_LEN, encap && !gso_partial,
>> - ESF_GZ_TX_TSO_ED_OUTER_IP4_ID, encap ? mangleid :
>> - ESE_GZ_TX_DESC_IP4_ID_NO_OP,
>> + ESF_GZ_TX_TSO_ED_OUTER_IP4_ID, mangleid_outer,
>> ESF_GZ_TX_TSO_VLAN_INSERT_EN, vlan_enable,
>> ESF_GZ_TX_TSO_VLAN_INSERT_TCI, vlan_tci
>> );
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index f3a3b761abfb..3d19c888b839 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -5290,13 +5290,18 @@ void skb_warn_bad_offload(const struct sk_buff *skb);
>>
>> static inline bool net_gso_ok(netdev_features_t features, int gso_type)
>> {
>> - netdev_features_t feature = (netdev_features_t)gso_type << NETIF_F_GSO_SHIFT;
>> + netdev_features_t feature;
>> +
>> + if (gso_type & (SKB_GSO_TCP_FIXEDID | SKB_GSO_TCP_FIXEDID_INNER))
>> + gso_type |= __SKB_GSO_TCP_FIXEDID;
>> +
>> + feature = ((netdev_features_t)gso_type << NETIF_F_GSO_SHIFT) & NETIF_F_GSO_MASK;
>>
>> /* check flags correspondence */
>> BUILD_BUG_ON(SKB_GSO_TCPV4 != (NETIF_F_TSO >> NETIF_F_GSO_SHIFT));
>> BUILD_BUG_ON(SKB_GSO_DODGY != (NETIF_F_GSO_ROBUST >> NETIF_F_GSO_SHIFT));
>> BUILD_BUG_ON(SKB_GSO_TCP_ECN != (NETIF_F_TSO_ECN >> NETIF_F_GSO_SHIFT));
>> - BUILD_BUG_ON(SKB_GSO_TCP_FIXEDID != (NETIF_F_TSO_MANGLEID >> NETIF_F_GSO_SHIFT));
>> + BUILD_BUG_ON(__SKB_GSO_TCP_FIXEDID != (NETIF_F_TSO_MANGLEID >> NETIF_F_GSO_SHIFT));
>> BUILD_BUG_ON(SKB_GSO_TCPV6 != (NETIF_F_TSO6 >> NETIF_F_GSO_SHIFT));
>> BUILD_BUG_ON(SKB_GSO_FCOE != (NETIF_F_FSO >> NETIF_F_GSO_SHIFT));
>> BUILD_BUG_ON(SKB_GSO_GRE != (NETIF_F_GSO_GRE >> NETIF_F_GSO_SHIFT));
>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>> index ca8be45dd8be..937acb1869a1 100644
>> --- a/include/linux/skbuff.h
>> +++ b/include/linux/skbuff.h
>> @@ -674,7 +674,7 @@ enum {
>> /* This indicates the tcp segment has CWR set. */
>> SKB_GSO_TCP_ECN = 1 << 2,
>>
>> - SKB_GSO_TCP_FIXEDID = 1 << 3,
>> + __SKB_GSO_TCP_FIXEDID = 1 << 3,
>>
>> SKB_GSO_TCPV6 = 1 << 4,
>>
>> @@ -707,6 +707,13 @@ enum {
>> SKB_GSO_FRAGLIST = 1 << 18,
>>
>> SKB_GSO_TCP_ACCECN = 1 << 19,
>> +
>> + /* These indirectly map onto the same netdev feature.
>> + * If NETIF_F_TSO_MANGLEID is set it may mangle both inner and outer
>> + * IDs.
>
> prefer to keep IDs. on the previous line even if over 80 chars.
>
>> + */
>> + SKB_GSO_TCP_FIXEDID = 1 << 30,
>> + SKB_GSO_TCP_FIXEDID_INNER = 1 << 31,
>> };
>>
>> #if BITS_PER_LONG > 32
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 93a25d87b86b..17cb399cdc2a 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -3769,7 +3769,10 @@ static netdev_features_t gso_features_check(const struct sk_buff *skb,
>> features &= ~dev->gso_partial_features;
>>
>> /* Make sure to clear the IPv4 ID mangling feature if the
>> - * IPv4 header has the potential to be fragmented.
>> + * IPv4 header has the potential to be fragmented. For
>> + * encapsulated packets, the outer headers are guaranteed to
>> + * have incrementing IDs if DF is not set so there is no need
>> + * to clear the IPv4 ID mangling feature.
>
> Why is this true. Or, why is it not also true for non-encapsulated or
> inner headers?
>
> The same preconditions (incl on DF) are now tested in inet_gro_flush
>
This comment is about the IDs that TSO generates, not the IDs that GRO accepts.
I'll rewrite the comment to make it clearer.
This statement is true because of the requirement specified above, that device
drivers must increment the outer IDs if the DF-bit is not set. This means that
MANGLEID cannot turn incrementing IDs into fixed IDs in the outer header if the
DF-bit is not set (unless the IDs were already fixed, after the next patch) so
there is no need to clear NETIF_F_TSO_MANGLEID.
This isn't true for non-encapsulated or inner headers because nothing guarantees
that MANGLEID won't mangle incrementing IDs into fixed IDs.
>> */
>> if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4) {
>> struct iphdr *iph = skb->encapsulation ?
>> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
>> index 76e38092cd8a..fc7a6955fa0a 100644
>> --- a/net/ipv4/af_inet.c
>> +++ b/net/ipv4/af_inet.c
>> @@ -1393,14 +1393,13 @@ struct sk_buff *inet_gso_segment(struct sk_buff *skb,
>>
>> segs = ERR_PTR(-EPROTONOSUPPORT);
>>
>> - if (!skb->encapsulation || encap) {
>> - udpfrag = !!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP);
>> - fixedid = !!(skb_shinfo(skb)->gso_type & SKB_GSO_TCP_FIXEDID);
>> + /* fixed ID is invalid if DF bit is not set */
>> + fixedid = !!(skb_shinfo(skb)->gso_type & (SKB_GSO_TCP_FIXEDID << encap));
>> + if (fixedid && !(ip_hdr(skb)->frag_off & htons(IP_DF)))
>> + goto out;
>>
>> - /* fixed ID is invalid if DF bit is not set */
>> - if (fixedid && !(ip_hdr(skb)->frag_off & htons(IP_DF)))
>> - goto out;
>> - }
>> + if (!skb->encapsulation || encap)
>> + udpfrag = !!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP);
>>
>> ops = rcu_dereference(inet_offloads[proto]);
>> if (likely(ops && ops->callbacks.gso_segment)) {
>> diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
>> index 1949eede9ec9..e6612bd84d09 100644
>> --- a/net/ipv4/tcp_offload.c
>> +++ b/net/ipv4/tcp_offload.c
>> @@ -471,7 +471,6 @@ INDIRECT_CALLABLE_SCOPE int tcp4_gro_complete(struct sk_buff *skb, int thoff)
>> const u16 offset = NAPI_GRO_CB(skb)->network_offsets[skb->encapsulation];
>> const struct iphdr *iph = (struct iphdr *)(skb->data + offset);
>> struct tcphdr *th = tcp_hdr(skb);
>> - bool is_fixedid;
>>
>> if (unlikely(NAPI_GRO_CB(skb)->is_flist)) {
>> skb_shinfo(skb)->gso_type |= SKB_GSO_FRAGLIST | SKB_GSO_TCPV4;
>> @@ -485,10 +484,8 @@ INDIRECT_CALLABLE_SCOPE int tcp4_gro_complete(struct sk_buff *skb, int thoff)
>> th->check = ~tcp_v4_check(skb->len - thoff, iph->saddr,
>> iph->daddr, 0);
>>
>> - is_fixedid = (NAPI_GRO_CB(skb)->ip_fixedid >> skb->encapsulation) & 1;
>> -
>> skb_shinfo(skb)->gso_type |= SKB_GSO_TCPV4 |
>> - (is_fixedid * SKB_GSO_TCP_FIXEDID);
>> + (NAPI_GRO_CB(skb)->ip_fixedid * SKB_GSO_TCP_FIXEDID);
>
> ip_fixedid can now be 0, 1 (outer), 2 (inner) or 3 (inner and outer).
>
> Not all generate a valid SKB_GSO type.
Since SKB_GSO_TCP_FIXEDID_INNER == SKB_GSO_TCP_FIXEDID << 1, all the values
listed above generate the correct set of SKB_GSO types. This fact is also
used in inet_gso_segment (SKB_GSO_TCP_FIXEDID << encap).
>>
>> tcp_gro_complete(skb);
>> return 0;
>> --
>> 2.36.1
>>
>
>
Powered by blists - more mailing lists