[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CADvbK_e_PTpq9pNEv-o2fQWjJ9qyV=JdMscTniSMKQttnpgF8Q@mail.gmail.com>
Date: Tue, 24 Nov 2020 21:30:37 +0800
From: Xin Long <lucien.xin@...il.com>
To: Alexander Duyck <alexander.duyck@...il.com>
Cc: network dev <netdev@...r.kernel.org>,
"linux-sctp @ vger . kernel . org" <linux-sctp@...r.kernel.org>,
Marcelo Ricardo Leitner <marcelo.leitner@...il.com>,
David Miller <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Guillaume Nault <gnault@...hat.com>,
Paolo Abeni <pabeni@...hat.com>,
Lorenzo Bianconi <lorenzo@...nel.org>
Subject: Re: [PATCH net-next] ip_gre: remove CRC flag from dev features in gre_gso_segment
On Tue, Nov 24, 2020 at 6:23 AM Alexander Duyck
<alexander.duyck@...il.com> wrote:
>
> On Mon, Nov 23, 2020 at 1:14 AM Xin Long <lucien.xin@...il.com> wrote:
> >
> > On Sat, Nov 21, 2020 at 12:10 AM Alexander Duyck
> > <alexander.duyck@...il.com> wrote:
> > >
> > > On Fri, Nov 20, 2020 at 2:23 AM Xin Long <lucien.xin@...il.com> wrote:
> > > >
> > > > On Fri, Nov 20, 2020 at 1:24 AM Alexander Duyck
> > > > <alexander.duyck@...il.com> wrote:
> > > > >
> > > > > On Wed, Nov 18, 2020 at 9:53 PM Xin Long <lucien.xin@...il.com> wrote:
> > > > > >
> > > > > > On Thu, Nov 19, 2020 at 4:35 AM Alexander Duyck
> > > > > > <alexander.duyck@...il.com> wrote:
> > > > > > >
> > > > > > > On Mon, Nov 16, 2020 at 1:17 AM Xin Long <lucien.xin@...il.com> wrote:
> > > > > > > >
>
> <snip>
>
> > > > > > @@ -27,7 +27,11 @@ static __le32 sctp_gso_make_checksum(struct sk_buff *skb)
> > > > > > {
> > > > > > skb->ip_summed = CHECKSUM_NONE;
> > > > > > skb->csum_not_inet = 0;
> > > > > > - gso_reset_checksum(skb, ~0);
> > > > > > + /* csum and csum_start in GSO CB may be needed to do the UDP
> > > > > > + * checksum when it's a UDP tunneling packet.
> > > > > > + */
> > > > > > + SKB_GSO_CB(skb)->csum = (__force __wsum)~0;
> > > > > > + SKB_GSO_CB(skb)->csum_start = skb_headroom(skb) + skb->len;
> > > > > > return sctp_compute_cksum(skb, skb_transport_offset(skb));
> > > > > > }
> > > > > >
> > > > > > And yes, this patch has been tested with GRE tunnel checksums enabled.
> > > > > > (... icsum ocsum).
> > > > > > And yes, it was segmented in the lower NIC level, and we can make it by:
> > > > > >
> > > > > > # ethtool -K gre1 tx-sctp-segmentation on
> > > > > > # ethtool -K veth0 tx-sctp-segmentation off
> > > > > >
> > > > > > (Note: "tx-checksum-sctp" and "gso" are on for both devices)
> > > > > >
> > > > > > Thanks.
> > > > >
> > > > > I would also turn off Tx and Rx checksum offload on your veth device
> > > > > in order to make certain you aren't falsely sending data across
> > > > > indicating that the checksums are valid when they are not. It might be
> > > > > better if you were to run this over an actual NIC as that could then
> > > > > provide independent verification as it would be a fixed checksum test.
> > > > >
> > > > > I'm still not convinced this is working correctly. Basically a crc32c
> > > > > is not the same thing as a 1's complement checksum so you should need
> > > > > to compute both if you have an SCTP packet tunneled inside a UDP or
> > > > > GRE packet with a checksum. I don't see how computing a crc32c should
> > > > > automatically give you a 1's complement checksum of ~0.
> > > >
> > > > On the tx Path [1] below, the sctp crc checksum is calculated in
> > > > sctp_gso_make_checksum() [a], where it calls *sctp_compute_cksum()*
> > > > to do that, and as for the code in it:
> > > >
> > > > SKB_GSO_CB(skb)->csum = (__force __wsum)~0;
> > > > SKB_GSO_CB(skb)->csum_start = skb_headroom(skb) + skb->len;
> > >
> > > Okay, so I think I know how this is working, but the number of things
> > > relied on is ugly. Normally assuming skb_headroom(skb) + skb->len
> > > being valid for this would be a non-starter. I was assuming you
> > > weren't doing the 1's compliment checksum because you weren't using
> > > __skb_checksum to generate it.
> > >
> > > If I am not mistaken SCTP GSO uses the GSO_BY_FRAGS and apparently
> > > none of the frags are using page fragments within the skb. Am I
> > > understanding correctly? One thing that would help to address some of
> > > my concerns would be to clear instead of set NETIF_F_SG in
> > > sctp_gso_segment since your checksum depends on linear skbs.
> > Right, no frag is using page fragments for SCTP GSO.
> > NETIF_F_SG is set here, because in skb_segment():
> >
> > if (hsize > len || !sg)
> > hsize = len;
> >
> > if (!hsize && i >= nfrags && skb_headlen(list_skb) &&
> > (skb_headlen(list_skb) == len || sg)) { <------
> > for flag_list
> >
> > without sg set, it won't go to this 'if' block, which is the process
> > of frag_list, see
>
> I don't think that is processing frag_list, it is processing frags. It
> is just updating list_skb as needed, however it is also configured
> outside of that block.
For SCTP's gso, we expect it going to the branch of matching
(skb_headlen(list_skb) == len), as it will reuse the skb->data.
>
> > commit 89319d3801d1d3ac29c7df1f067038986f267d29
> > Author: Herbert Xu <herbert@...dor.apana.org.au>
> > Date: Mon Dec 15 23:26:06 2008 -0800
> >
> > net: Add frag_list support to skb_segment
> >
> > do you think this might be a bug in skb_segment()?
>
> I would say it is assuming your logic is correct. Basically it should
> be able to segment the frame regardless of if the lower device
> supports NETIF_F_SG or not.
>
> > I was also thinking if SCTP GSO could go with the way of UDP's
> > with skb_segment_list() instead of GSO_BY_FRAGS things.
> > the different is that the head skb does not only include header,
> > but may also include userdata/payload with skb_segment_list().
>
> The problem is right now the way the checksum is being configured you
> would have to keep the payload and data all in one logical data
> segment starting at skb->data. We cannot have any data stored in
> shinfo->frags, nor shinfo->frag_list.
That's right, current SCTP protocol stack don't save tx data into
frags or frag_list, and SCTP doesn't support GRO by now.
>
> > >
> > > > is prepared for doing 1's complement checksum (for outer UDP/GRE), and used
> > > > in gre_gso_segment() [b], where it calls gso_make_checksum() to do that
> > > > when need_csum is set. Note that, here it played a TRICK:
> > > >
> > > > I set SKB_GSO_CB->csum_start to the end of this packet and
> > > > SKB_GSO_CB->csum = ~0 manually, so that in gso_make_checksum() it will do
> > > > the 1's complement checksum for outer UDP/GRE by summing all packet bits up.
> > > > See gso_make_checksum() (called by gre_gso_segment()):
> > > >
> > > > unsigned char *csum_start = skb_transport_header(skb);
> > > > int plen = (skb->head + SKB_GSO_CB(skb)->csum_start) - csum_start;
> > > > /* now plen is from udp header to the END of packet. */
> > > > __wsum partial = SKB_GSO_CB(skb)->csum;
> > > >
> > > > return csum_fold(csum_partial(csum_start, plen, partial));
> > > >
> > > > So yes, here it does compute both if I have an SCTP packet tunnelled inside
> > > > a UDP or GRE packet with a checksum.
> > >
> > > Assuming you have the payload data in the skb->data section. Normally
> > > payload is in page frags. That is why I was concerned. You have to
> > > have guarantees in place that there will not be page fragments
> > > attached to the skb.
> > On SCTP TX path, sctp_packet_transmit() will always alloc linear skbs
> > and reserve headers for lower-layer protocols. I think this will guarantee it.
>
> That ends up being my big concern. We need to make certain that is
> true for all GRO and GSO cases if we are going to operate on the
> assumption that just doing a linear checksum will work in the GSO
> code. Otherwise we need to make certain that segmentation will correct
> this for us if it cannot be guaranteed. That is why I would be much
> more comfortable if we were able to just drop the NETIF_F_SG bit when
> doing the segmentation since that would guarantee the results we are
> looking for.
before doing that, we should have a fix below:
@@ -3850,8 +3850,6 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
hsize = skb_headlen(head_skb) - offset;
if (hsize < 0)
hsize = 0;
- if (hsize > len || !sg)
- hsize = len;
if (!hsize && i >= nfrags && skb_headlen(list_skb) &&
(skb_headlen(list_skb) == len || sg)) {
@@ -3896,6 +3894,9 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
skb_release_head_state(nskb);
__skb_push(nskb, doffset);
} else {
+ if (hsize > len || !sg)
+ hsize = len;
+
I believe it makes more sense to move this check to the 'else' branch.
Powered by blists - more mailing lists