[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID:
 <PAXPR07MB7984E07EFBEC48A3DF3B3704A35B2@PAXPR07MB7984.eurprd07.prod.outlook.com>
Date: Thu, 14 Nov 2024 01:04:47 +0000
From: "Chia-Yu Chang (Nokia)" <chia-yu.chang@...ia-bell-labs.com>
To: Jason Wang <jasowang@...hat.com>, Ilpo Järvinen
	<ij@...nel.org>
CC: "Michael S. Tsirkin" <mst@...hat.com>, "virtualization@...ts.linux.dev"
	<virtualization@...ts.linux.dev>, Eric Dumazet <edumazet@...gle.com>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>, "dsahern@...il.com"
	<dsahern@...il.com>, "davem@...emloft.net" <davem@...emloft.net>,
	"dsahern@...nel.org" <dsahern@...nel.org>, "pabeni@...hat.com"
	<pabeni@...hat.com>, "joel.granados@...nel.org" <joel.granados@...nel.org>,
	"kuba@...nel.org" <kuba@...nel.org>, "andrew+netdev@...n.ch"
	<andrew+netdev@...n.ch>, "horms@...nel.org" <horms@...nel.org>,
	"pablo@...filter.org" <pablo@...filter.org>, "kadlec@...filter.org"
	<kadlec@...filter.org>, "netfilter-devel@...r.kernel.org"
	<netfilter-devel@...r.kernel.org>, "coreteam@...filter.org"
	<coreteam@...filter.org>, "ncardwell@...gle.com" <ncardwell@...gle.com>,
	"Koen De Schepper (Nokia)" <koen.de_schepper@...ia-bell-labs.com>,
	"g.white@...lelabs.com" <g.white@...lelabs.com>,
	"ingemar.s.johansson@...csson.com" <ingemar.s.johansson@...csson.com>,
	"mirja.kuehlewind@...csson.com" <mirja.kuehlewind@...csson.com>,
	"cheshire@...le.com" <cheshire@...le.com>, "rs.ietf@....at" <rs.ietf@....at>,
	"Jason_Livingood@...cast.com" <Jason_Livingood@...cast.com>,
	"vidhi_goel@...le.com" <vidhi_goel@...le.com>
Subject: RE: [PATCH v5 net-next 09/13] gro: prevent ACE field corruption &
 better AccECN handling
>From: Jason Wang <jasowang@...hat.com> 
>Sent: Wednesday, November 13, 2024 2:43 AM
>To: Ilpo Järvinen <ij@...nel.org>
>Cc: Chia-Yu Chang (Nokia) <chia-yu.chang@...ia-bell-labs.com>; Michael S. Tsirkin <mst@...hat.com>; virtualization@...ts.linux.dev; Eric Dumazet <edumazet@...gle.com>; netdev@...r.kernel.org; dsahern@...il.com; davem@...emloft.net; dsahern@...nel.org; pabeni@...hat.com; joel.granados@...nel.org; kuba@...nel.org; andrew+netdev@...n.ch; horms@...nel.org; pablo@...filter.org; kadlec@...filter.org; netfilter-devel@...r.kernel.org; coreteam@...filter.org; ncardwell@...gle.com; Koen De Schepper (Nokia) <koen.de_schepper@...ia-bell-labs.com>; g.white@...lelabs.com; ingemar.s.johansson@...csson.com; mirja.kuehlewind@...csson.com; cheshire@...le.com; rs.ietf@....at; Jason_Livingood@...cast.com; vidhi_goel@...le.com
>Subject: Re: [PATCH v5 net-next 09/13] gro: prevent ACE field corruption & better AccECN handling
>
>[You don't often get email from jasowang@...hat.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
>CAUTION: This is an external email. Please be very careful when clicking links or opening attachments. See the URL nok.it/ext for additional information.
>
>
>
>On Wed, Nov 13, 2024 at 5:19 AM Ilpo Järvinen <ij@...nel.org> wrote:
>>
>> Adding a few virtio people. Please see the virtio spec/flag question 
>> below.
>>
>> On Tue, 12 Nov 2024, Chia-Yu Chang (Nokia) wrote:
>>
>> > >From: Ilpo Järvinen <ij@...nel.org>
>> > >Sent: Thursday, November 7, 2024 8:28 PM
>> > >To: Eric Dumazet <edumazet@...gle.com>
>> > >Cc: Chia-Yu Chang (Nokia) <chia-yu.chang@...ia-bell-labs.com>; 
>> > >netdev@...r.kernel.org; dsahern@...il.com; davem@...emloft.net; 
>> > >dsahern@...nel.org; pabeni@...hat.com; joel.granados@...nel.org; 
>> > >kuba@...nel.org; andrew+netdev@...n.ch; horms@...nel.org; 
>> > >pablo@...filter.org; kadlec@...filter.org; 
>> > >netfilter-devel@...r.kernel.org; coreteam@...filter.org; 
>> > >ncardwell@...gle.com; Koen De Schepper (Nokia) 
>> > ><koen.de_schepper@...ia-bell-labs.com>; g.white@...lelabs.com; 
>> > >ingemar.s.johansson@...csson.com; mirja.kuehlewind@...csson.com; 
>> > >cheshire@...le.com; rs.ietf@....at; Jason_Livingood@...cast.com; 
>> > >vidhi_goel@...le.com
>> > >Subject: Re: [PATCH v5 net-next 09/13] gro: prevent ACE field 
>> > >corruption & better AccECN handling
>> > >
>> > >
>> > >CAUTION: This is an external email. Please be very careful when clicking links or opening attachments. See the URL nok.it/ext for additional information.
>> > >
>> > >
>> > >
>> > >On Thu, 7 Nov 2024, Eric Dumazet wrote:
>> > >
>> > >>On Tue, Nov 5, 2024 at 11:07 AM <chia-yu.chang@...ia-bell-labs.com> wrote:
>> > >>>
>> > >>> From: Ilpo Järvinen <ij@...nel.org>
>> > >>>
>> > >>> There are important differences in how the CWR field behaves in
>> > >>> RFC3168 and AccECN. With AccECN, CWR flag is part of the ACE 
>> > >>> counter and its changes are important so adjust the flags 
>> > >>> changed mask accordingly.
>> > >>>
>> > >>> Also, if CWR is there, set the Accurate ECN GSO flag to avoid 
>> > >>> corrupting CWR flag somewhere.
>> > >>>
>> > >>> Signed-off-by: Ilpo Järvinen <ij@...nel.org>
>> > >>> Signed-off-by: Chia-Yu Chang <chia-yu.chang@...ia-bell-labs.com>
>> > >>> ---
>> > >>>  net/ipv4/tcp_offload.c | 4 ++--
>> > >>>  1 file changed, 2 insertions(+), 2 deletions(-)
>> > >>>
>> > >>> diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c 
>> > >>> index
>> > >>> 0b05f30e9e5f..f59762d88c38 100644
>> > >>> --- a/net/ipv4/tcp_offload.c
>> > >>> +++ b/net/ipv4/tcp_offload.c
>> > >>> @@ -329,7 +329,7 @@ struct sk_buff *tcp_gro_receive(struct list_head *head, struct sk_buff *skb,
>> > >>>         th2 = tcp_hdr(p);
>> > >>>         flush = (__force int)(flags & TCP_FLAG_CWR);
>> > >>>         flush |= (__force int)((flags ^ tcp_flag_word(th2)) &
>> > >>> -                 ~(TCP_FLAG_CWR | TCP_FLAG_FIN | TCP_FLAG_PSH));
>> > >>> +                 ~(TCP_FLAG_FIN | TCP_FLAG_PSH));
>> > >>>         flush |= (__force int)(th->ack_seq ^ th2->ack_seq);
>> > >>>         for (i = sizeof(*th); i < thlen; i += 4)
>> > >>>                 flush |= *(u32 *)((u8 *)th + i) ^ @@ -405,7 
>> > >>> +405,7 @@ void tcp_gro_complete(struct sk_buff *skb)
>> > >>>         shinfo->gso_segs = NAPI_GRO_CB(skb)->count;
>> > >>>
>> > >>>         if (th->cwr)
>> > >>> -               shinfo->gso_type |= SKB_GSO_TCP_ECN;
>> > >>> +               shinfo->gso_type |= SKB_GSO_TCP_ACCECN;
>> > >>>  }
>> > >>>  EXPORT_SYMBOL(tcp_gro_complete);
>> > >>>
>> > >>
>> > >> I do not really understand this patch. How a GRO engine can know 
>> > >> which ECN variant the peers are using ?
>> > >
>> > >Hi Eric,
>> > >
>> > >Thanks for taking a look.
>> > >
>> > >GRO doesn't know. Setting SKB_GSO_TCP_ECN in case of not knowing can result in header change that corrupts ACE field. Thus, GRO has to assume the RFC3168 CWR offloading trick cannot be used anymore (unless it tracks the connection and knows the bits are used for RFC3168 which is something nobody is going to do).
>> > >
>> > >The main point of SKB_GSO_TCP_ACCECN is to prevent SKB_GSO_TCP_ECN or NETIF_F_TSO_ECN offloading to be used for the skb as it would corrupt ACE field value.
>> >
>> > Hi Eric and Ilpo,
>> >
>> > From my understanding of another email thread (patch 08/13), it seems the conclusions that SKB_GSO_TCP_ACCECN will still be needed.
>> >
>> > >
>> > >SKB_GSO_TCP_ACCECN doesn't allow CWR bits change within a super-skb but the same CWR flag should be repeated for all segments. In a sense, it's simpler than RFC3168 offloading.
>> > >
>> > >> SKB_GSO_TCP_ECN is also used from other points, what is your plan ?
>> > >>
>> > >> git grep -n SKB_GSO_TCP_ECN
>> > >> drivers/net/ethernet/hisilicon/hns3/hns3_enet.c:3888:
>> > >> skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN;
>> > >> drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:1291:
>> > >> skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN;
>> > >> drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:1312:
>> > >> skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN;
>> > >
>> > >These looked like they should be just changed to set SKB_GSO_TCP_ACCECN instead.
>> >
>> > I agree with these changes and will apply them in the next version.
>> >
>> > >
>> > >I don't anymore recall why I didn't change those when I made this patch long time ago, perhaps it was just an oversight or things have changed somehow since then.
>> > >
>> > >> include/linux/netdevice.h:5061: BUILD_BUG_ON(SKB_GSO_TCP_ECN != 
>> > >> (NETIF_F_TSO_ECN >> NETIF_F_GSO_SHIFT));
>> > >> include/linux/skbuff.h:664:     SKB_GSO_TCP_ECN = 1 << 2,
>> > >
>> > >Not relevant.
>> > >
>> > >> include/linux/virtio_net.h:88:                  gso_type |= SKB_GSO_TCP_ECN;
>> > >> include/linux/virtio_net.h:161:         switch (gso_type & ~SKB_GSO_TCP_ECN) {
>> > >> include/linux/virtio_net.h:226:         if (sinfo->gso_type & SKB_GSO_TCP_ECN)
>> > >
>> > >These need to be looked further what's going on as UAPI is also  
>> > >involved here.
>> >
>> > I had a look at these parts, and only the 1st and 3rd ones are relevant.
>> > Related to the 1st one, I propose to change from
>> >
>> >                 if (hdr->gso_type & VIRTIO_NET_HDR_GSO_ECN)
>> >                         gso_type |= SKB_GSO_TCP_ECN;
>> >
>> > to
>> >
>> >                 if (hdr->gso_type & VIRTIO_NET_HDR_GSO_ECN)
>> >                         gso_type |= SKB_GSO_TCP_ACCECN;
>> >
>> > The reason behind this proposed change is similar as the above 
>> > changes in en_rx.c.
>>
>> But en_rx.c is based one CWR flag, there's no similar check here.
>>
>> > For the 3rd one, I suggest to change from
>> >
>> >                 if (sinfo->gso_type & SKB_GSO_TCP_ECN)
>> >                         hdr->gso_type |= VIRTIO_NET_HDR_GSO_ECN;
>> >
>> > to
>> >
>> >                 if (sinfo->gso_type & (SKB_GSO_TCP_ECN | SKB_GSO_TCP_ACCECN))
>> >                         hdr->gso_type |= VIRTIO_NET_HDR_GSO_ECN;
>> >
>> > This proposed change is because VIRTIO_NET_HDR_GSO_ECN must be set 
>> > to allow TCP packets requiring segmentation offload which have ECN bit set.
>> > So, no matter whether skb gso_type have GSO_TCP_ECN flag or 
>> > GSO_TCP_ACCECN flag, the corresponding hdr gso_type shall be set.
>> >
>> > But, I wonder what would the driver do when with 
>> > VIRTIO_NET_HDR_GSO_ECN flag. Will it corrupts CWR or not?
>>
>> I'm starting to heavily question what is the meaning of that 
>> VIRTIO_NET_HDR_GSO_ECN flag and if it's even consistent...
>>
>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
>> ub.com%2Fqemu%2Fqemu%2Fblob%2Fmaster%2Fnet%2Feth.c&data=05%7C02%7Cchia
>> -yu.chang%40nokia-bell-labs.com%7C64e73bacd5eb4647905208dd03848651%7C5
>> d4717519675428d917b70f44f9630b0%7C0%7C0%7C638670589949336838%7CUnknown
>> %7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4
>> zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=iEeGdA4mcq1WC
>> unKWvxNZ6yt213xKWKOpjyYdBzLzTk%3D&reserved=0
>>
>> That sets VIRTIO_NET_HDR_GSO_ECN based on CE?? (Whereas kernel 
>> associates the related SKB_GSO_TCP_ECN to CWR offloading.)
>>
>> The virtio spec is way too vague to be useful so it would not be 
>> surprising if there are diverging interpretations from implementers:
>>
>> "If the driver negotiated the VIRTIO_NET_F_HOST_ECN feature, the 
>> VIRTIO_NET_HDR_GSO_ECN bit in gso_type indicates that the TCP packet 
>> has the ECN bit set."
>>
>> What is "the ECN bit" in terms used by TCP (or IP)? Could some virtio 
>> folks please explain?
>
>According to the current Linux implementation in virtio_net_hdr_to_skb():
>
>                if (hdr->gso_type & VIRTIO_NET_HDR_GSO_ECN)
>                        gso_type |= SKB_GSO_TCP_ECN;
>
>It mapps to SKB_GSO_TCP_ECN which is:
>
>        /* This indicates the tcp segment has CWR set. */
>        SKB_GSO_TCP_ECN = 1 << 2,
>
>Thanks
This patch series is related to adding new SKB_GSO_TCP_ACCECN.
The reason behind is because how the CWR field handling is different between RFC3168 and AccECN.
You can to find the original patches in below:
https://lore.kernel.org/netdev/20241105100647.117346-9-chia-yu.chang@nokia-bell-labs.com/
https://lore.kernel.org/netdev/20241105100647.117346-10-chia-yu.chang@nokia-bell-labs.com/
When reviewing all occurrence of SKB_GSO_TCP_ECN, we see it's related to VIRTIO_NET_HDR_GSO_ECN.
But after checking the virtio spec, it's quite unclear about VIRTIO_NET_HDR_GSO_ECN:
"If the driver negotiated the VIRTIO_NET_F_HOST_ECN feature, the VIRTIO_NET_HDR_GSO_ECN bit in gso_type indicates that the TCP packet has the ECN bit set."
In virtnet_probe(), it looks like VIRTIO_NET_F_HOST_ECN is used to enable NETIF_F_TSO_ECN support:
                if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_ECN))
                        dev->hw_features |= NETIF_F_TSO_ECN;
So the question is that is its TSO engine can be set to not touch CWR flag even with NETIF_F_TSO_ECN?
Powered by blists - more mailing lists
 
