[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6c6aced9-7543-b195-93e9-17e824538c55@kernel.org>
Date: Tue, 12 Nov 2024 23:19:09 +0200 (EET)
From: Ilpo Järvinen <ij@...nel.org>
To: "Chia-Yu Chang (Nokia)" <chia-yu.chang@...ia-bell-labs.com>,
"Michael S. Tsirkin" <mst@...hat.com>, Jason Wang <jasowang@...hat.com>,
virtualization@...ts.linux.dev
cc: Eric Dumazet <edumazet@...gle.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"dsahern@...il.com" <dsahern@...il.com>,
"davem@...emloft.net" <davem@...emloft.net>,
"dsahern@...nel.org" <dsahern@...nel.org>,
"pabeni@...hat.com" <pabeni@...hat.com>,
"joel.granados@...nel.org" <joel.granados@...nel.org>,
"kuba@...nel.org" <kuba@...nel.org>,
"andrew+netdev@...n.ch" <andrew+netdev@...n.ch>,
"horms@...nel.org" <horms@...nel.org>,
"pablo@...filter.org" <pablo@...filter.org>,
"kadlec@...filter.org" <kadlec@...filter.org>,
"netfilter-devel@...r.kernel.org" <netfilter-devel@...r.kernel.org>,
"coreteam@...filter.org" <coreteam@...filter.org>,
"ncardwell@...gle.com" <ncardwell@...gle.com>,
"Koen De Schepper (Nokia)" <koen.de_schepper@...ia-bell-labs.com>,
"g.white@...lelabs.com" <g.white@...lelabs.com>,
"ingemar.s.johansson@...csson.com" <ingemar.s.johansson@...csson.com>,
"mirja.kuehlewind@...csson.com" <mirja.kuehlewind@...csson.com>,
"cheshire@...le.com" <cheshire@...le.com>,
"rs.ietf@....at" <rs.ietf@....at>,
"Jason_Livingood@...cast.com" <Jason_Livingood@...cast.com>,
"vidhi_goel@...le.com" <vidhi_goel@...le.com>
Subject: RE: [PATCH v5 net-next 09/13] gro: prevent ACE field corruption &
better AccECN handling
Adding a few virtio people. Please see the virtio spec/flag question
below.
On Tue, 12 Nov 2024, Chia-Yu Chang (Nokia) wrote:
> >From: Ilpo Järvinen <ij@...nel.org>
> >Sent: Thursday, November 7, 2024 8:28 PM
> >To: Eric Dumazet <edumazet@...gle.com>
> >Cc: Chia-Yu Chang (Nokia) <chia-yu.chang@...ia-bell-labs.com>; netdev@...r.kernel.org; dsahern@...il.com; davem@...emloft.net; dsahern@...nel.org; pabeni@...hat.com; joel.granados@...nel.org; kuba@...nel.org; andrew+netdev@...n.ch; horms@...nel.org; pablo@...filter.org; kadlec@...filter.org; netfilter-devel@...r.kernel.org; coreteam@...filter.org; ncardwell@...gle.com; Koen De Schepper (Nokia) <koen.de_schepper@...ia-bell-labs.com>; g.white@...lelabs.com; ingemar.s.johansson@...csson.com; mirja.kuehlewind@...csson.com; cheshire@...le.com; rs.ietf@....at; Jason_Livingood@...cast.com; vidhi_goel@...le.com
> >Subject: Re: [PATCH v5 net-next 09/13] gro: prevent ACE field corruption & better AccECN handling
> >
> >
> >CAUTION: This is an external email. Please be very careful when clicking links or opening attachments. See the URL nok.it/ext for additional information.
> >
> >
> >
> >On Thu, 7 Nov 2024, Eric Dumazet wrote:
> >
> >>On Tue, Nov 5, 2024 at 11:07 AM <chia-yu.chang@...ia-bell-labs.com> wrote:
> >>>
> >>> From: Ilpo Järvinen <ij@...nel.org>
> >>>
> >>> There are important differences in how the CWR field behaves in
> >>> RFC3168 and AccECN. With AccECN, CWR flag is part of the ACE counter
> >>> and its changes are important so adjust the flags changed mask
> >>> accordingly.
> >>>
> >>> Also, if CWR is there, set the Accurate ECN GSO flag to avoid
> >>> corrupting CWR flag somewhere.
> >>>
> >>> Signed-off-by: Ilpo Järvinen <ij@...nel.org>
> >>> Signed-off-by: Chia-Yu Chang <chia-yu.chang@...ia-bell-labs.com>
> >>> ---
> >>> net/ipv4/tcp_offload.c | 4 ++--
> >>> 1 file changed, 2 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c index
> >>> 0b05f30e9e5f..f59762d88c38 100644
> >>> --- a/net/ipv4/tcp_offload.c
> >>> +++ b/net/ipv4/tcp_offload.c
> >>> @@ -329,7 +329,7 @@ struct sk_buff *tcp_gro_receive(struct list_head *head, struct sk_buff *skb,
> >>> th2 = tcp_hdr(p);
> >>> flush = (__force int)(flags & TCP_FLAG_CWR);
> >>> flush |= (__force int)((flags ^ tcp_flag_word(th2)) &
> >>> - ~(TCP_FLAG_CWR | TCP_FLAG_FIN | TCP_FLAG_PSH));
> >>> + ~(TCP_FLAG_FIN | TCP_FLAG_PSH));
> >>> flush |= (__force int)(th->ack_seq ^ th2->ack_seq);
> >>> for (i = sizeof(*th); i < thlen; i += 4)
> >>> flush |= *(u32 *)((u8 *)th + i) ^ @@ -405,7 +405,7
> >>> @@ void tcp_gro_complete(struct sk_buff *skb)
> >>> shinfo->gso_segs = NAPI_GRO_CB(skb)->count;
> >>>
> >>> if (th->cwr)
> >>> - shinfo->gso_type |= SKB_GSO_TCP_ECN;
> >>> + shinfo->gso_type |= SKB_GSO_TCP_ACCECN;
> >>> }
> >>> EXPORT_SYMBOL(tcp_gro_complete);
> >>>
> >>
> >> I do not really understand this patch. How a GRO engine can know which
> >> ECN variant the peers are using ?
> >
> >Hi Eric,
> >
> >Thanks for taking a look.
> >
> >GRO doesn't know. Setting SKB_GSO_TCP_ECN in case of not knowing can result in header change that corrupts ACE field. Thus, GRO has to assume the RFC3168 CWR offloading trick cannot be used anymore (unless it tracks the connection and knows the bits are used for RFC3168 which is something nobody is going to do).
> >
> >The main point of SKB_GSO_TCP_ACCECN is to prevent SKB_GSO_TCP_ECN or NETIF_F_TSO_ECN offloading to be used for the skb as it would corrupt ACE field value.
>
> Hi Eric and Ilpo,
>
> From my understanding of another email thread (patch 08/13), it seems the conclusions that SKB_GSO_TCP_ACCECN will still be needed.
>
> >
> >SKB_GSO_TCP_ACCECN doesn't allow CWR bits change within a super-skb but the same CWR flag should be repeated for all segments. In a sense, it's simpler than RFC3168 offloading.
> >
> >> SKB_GSO_TCP_ECN is also used from other points, what is your plan ?
> >>
> >> git grep -n SKB_GSO_TCP_ECN
> >> drivers/net/ethernet/hisilicon/hns3/hns3_enet.c:3888:
> >> skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN;
> >> drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:1291:
> >> skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN;
> >> drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:1312:
> >> skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN;
> >
> >These looked like they should be just changed to set SKB_GSO_TCP_ACCECN instead.
>
> I agree with these changes and will apply them in the next version.
>
> >
> >I don't anymore recall why I didn't change those when I made this patch long time ago, perhaps it was just an oversight or things have changed somehow since then.
> >
> >> include/linux/netdevice.h:5061: BUILD_BUG_ON(SKB_GSO_TCP_ECN !=
> >> (NETIF_F_TSO_ECN >> NETIF_F_GSO_SHIFT));
> >> include/linux/skbuff.h:664: SKB_GSO_TCP_ECN = 1 << 2,
> >
> >Not relevant.
> >
> >> include/linux/virtio_net.h:88: gso_type |= SKB_GSO_TCP_ECN;
> >> include/linux/virtio_net.h:161: switch (gso_type & ~SKB_GSO_TCP_ECN) {
> >> include/linux/virtio_net.h:226: if (sinfo->gso_type & SKB_GSO_TCP_ECN)
> >
> >These need to be looked further what's going on as UAPI is also
> > involved here.
>
> I had a look at these parts, and only the 1st and 3rd ones are relevant.
> Related to the 1st one, I propose to change
> from
>
> if (hdr->gso_type & VIRTIO_NET_HDR_GSO_ECN)
> gso_type |= SKB_GSO_TCP_ECN;
>
> to
>
> if (hdr->gso_type & VIRTIO_NET_HDR_GSO_ECN)
> gso_type |= SKB_GSO_TCP_ACCECN;
>
> The reason behind this proposed change is similar as the above changes
> in en_rx.c.
But en_rx.c is based one CWR flag, there's no similar check here.
> For the 3rd one, I suggest to change from
>
> if (sinfo->gso_type & SKB_GSO_TCP_ECN)
> hdr->gso_type |= VIRTIO_NET_HDR_GSO_ECN;
>
> to
>
> if (sinfo->gso_type & (SKB_GSO_TCP_ECN | SKB_GSO_TCP_ACCECN))
> hdr->gso_type |= VIRTIO_NET_HDR_GSO_ECN;
>
> This proposed change is because VIRTIO_NET_HDR_GSO_ECN must be set to
> allow TCP packets requiring segmentation offload which have ECN bit set.
> So, no matter whether skb gso_type have GSO_TCP_ECN flag or
> GSO_TCP_ACCECN flag, the corresponding hdr gso_type shall be set.
>
> But, I wonder what would the driver do when with VIRTIO_NET_HDR_GSO_ECN
> flag. Will it corrupts CWR or not?
I'm starting to heavily question what is the meaning of that
VIRTIO_NET_HDR_GSO_ECN flag and if it's even consistent...
https://github.com/qemu/qemu/blob/master/net/eth.c
That sets VIRTIO_NET_HDR_GSO_ECN based on CE?? (Whereas kernel associates
the related SKB_GSO_TCP_ECN to CWR offloading.)
The virtio spec is way too vague to be useful so it would not be
surprising if there are diverging interpretations from implementers:
"If the driver negotiated the VIRTIO_NET_F_HOST_ECN feature, the
VIRTIO_NET_HDR_GSO_ECN bit in gso_type indicates that the TCP packet has
the ECN bit set."
What is "the ECN bit" in terms used by TCP (or IP)? Could some virtio
folks please explain?
--
i.
Powered by blists - more mailing lists