lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <56AF98C9.50906@gmail.com>
Date:	Mon, 1 Feb 2016 15:41:29 -0200
From:	Marcelo Ricardo Leitner <marcelo.leitner@...il.com>
To:	Alexander Duyck <alexander.duyck@...il.com>
Cc:	Netdev <netdev@...r.kernel.org>,
	Neil Horman <nhorman@...driver.com>,
	Vlad Yasevich <vyasevich@...il.com>,
	David Miller <davem@...emloft.net>,
	Jesper Dangaard Brouer <brouer@...hat.com>,
	Alexei Starovoitov <alexei.starovoitov@...il.com>,
	Daniel Borkmann <borkmann@...earbox.net>,
	Marek Majkowski <marek@...udflare.com>,
	Hannes Frederic Sowa <hannes@...essinduktion.org>,
	Florian Westphal <fw@...len.de>, pabeni@...hat.com,
	John Fastabend <john.r.fastabend@...el.com>,
	linux-sctp@...r.kernel.org, Tom Herbert <tom@...bertland.com>
Subject: Re: [RFC PATCH net-next 3/3] sctp: Add GSO support

Em 01-02-2016 15:03, Alexander Duyck escreveu:
 > On Mon, Feb 1, 2016 at 8:22 AM, Marcelo Ricardo Leitner
 > <marcelo.leitner@...il.com> wrote:
 >> Em 30-01-2016 02:07, Alexander Duyck escreveu:
 >>>
 >>> On Fri, Jan 29, 2016 at 11:42 AM, Marcelo Ricardo Leitner
 >>> <marcelo.leitner@...il.com> wrote:
 >>>>
 >>>> On Fri, Jan 29, 2016 at 11:15:54AM -0800, Alexander Duyck wrote:
 >>>>>
 >>>>> On Wed, Jan 27, 2016 at 9:06 AM, Marcelo Ricardo Leitner
 >>
 >> ...
 >>
 >>>>>> diff --git a/net/core/dev.c b/net/core/dev.c
 >>>>>> index
 >>>>>> 
8cba3d852f251c503b193823b71b27aaef3fb3ae..9583284086967c0746de5f553535e25e125714a5
 >>>>>> 100644
 >>>>>> --- a/net/core/dev.c
 >>>>>> +++ b/net/core/dev.c
 >>>>>> @@ -2680,7 +2680,11 @@ EXPORT_SYMBOL(skb_mac_gso_segment);
 >>>>>>    static inline bool skb_needs_check(struct sk_buff *skb, bool 
tx_path)
 >>>>>>    {
 >>>>>>           if (tx_path)
 >>>>>> -               return skb->ip_summed != CHECKSUM_PARTIAL;
 >>>>>> +               /* FIXME: Why only packets with checksum 
offloading are
 >>>>>> +                * supported for GSO?
 >>>>>> +                */
 >>>>>> +               return skb->ip_summed != CHECKSUM_PARTIAL &&
 >>>>>> +                      skb->ip_summed != CHECKSUM_UNNECESSARY;
 >>>>>>           else
 >>>>>>                   return skb->ip_summed == CHECKSUM_NONE;
 >>>>>>    }
 >>>>>
 >>>>>
 >>>>> Tom Herbert just got rid of the use of CHECKSUM_UNNECESSARY in the
 >>>>> transmit path a little while ago.  Please don't reintroduce it.
 >>>>
 >>>>
 >>>> Can you give me some pointers on that? I cannot find such change.
 >>>> skb_needs_check() seems to be like that since beginning.
 >>>
 >>>
 >>> Maybe you need to update your kernel.  All this stuff was changed in
 >>> December and has been this way for a little while now.
 >>>
 >>> Commits:
 >>> 7a6ae71b24905 "net: Elaborate on checksum offload interface 
description"
 >>> 253aab0597d9e "fcoe: Use CHECKSUM_PARTIAL to indicate CRC offload"
 >>> 53692b1de419c "sctp: Rename NETIF_F_SCTP_CSUM to NETIF_F_SCTP_CRC"
 >>>
 >>> The main reason I even noticed it is because of some of the work I did
 >>> on the Intel NIC offloads.
 >>
 >>
 >> Ok I have those here, but my need here is different. I want to do 
GSO with
 >> packets that won't have CRC offloaded, so I shouldn't use 
CHECKSUM_PARTIAL
 >> but something else.
 >
 > CHECKSUM_NONE if you don't want to have any of the CRC or checksums
 > offloaded.  However as I mentioned before you will want to fake it
 > then since skb_segment assumes it is doing a 1's compliment checksum
 > so you will want to pass NET_F_HW_CSUM as a feature flag and then set
 > CHECKSUM_NONE after the frame has been segmented.

Ok

 >> ...
 >>
 >>>>>> diff --git a/net/sctp/offload.c b/net/sctp/offload.c
 >>>>>> index
 >>>>>> 
7080a6318da7110c1688dd0c5bb240356dbd0cd3..3b96035fa180a4e7195f7b6e7a8be7b97c8f8b26
 >>>>>> 100644
 >>>>>> --- a/net/sctp/offload.c
 >>>>>> +++ b/net/sctp/offload.c
 >>>>>> @@ -36,8 +36,61 @@
 >>>>>>    #include <net/sctp/checksum.h>
 >>>>>>    #include <net/protocol.h>
 >>>>>>
 >>>>>> +static __le32 sctp_gso_make_checksum(struct sk_buff *skb)
 >>>>>> +{
 >>>>>> +       skb->ip_summed = CHECKSUM_NONE;
 >>>>>> +       return sctp_compute_cksum(skb, skb_transport_offset(skb));
 >>>>>> +}
 >>>>>> +
 >>>>>
 >>>>>
 >>>>> I really despise the naming of this bit here.  SCTP does not use a
 >>>>> checksum.  It uses a CRC.  Please don't call this a checksum as it
 >>>>> will just make the code really confusing.   I think the name 
should be
 >>>>> something like gso_make_crc32c.
 >>>>
 >>>>
 >>>> Agreed. SCTP code still references it as 'cksum'. I'll change that in
 >>>> another patch.
 >>>>
 >>>>> I think we need to address the CRC issues before we can really get
 >>>>> into segmentation.  Specifically we need to be able to offload SCTP
 >>>>> and FCoE in software since they both use the CHECKSUM_PARTIAL value
 >>>>> and then we can start cleaning up more of this mess and move onto
 >>>>> segmentation.
 >>>>
 >>>>
 >>>> Hm? The mess on CRC issues here is caused by this patch alone. 
It's good
 >>>> as it is today. And a good part of this mess is caused by trying 
to GSO
 >>>> without offloading CRC too.
 >>>>
 >>>> Or you mean that SCTP and FCoE should stop using CHECKSUM_* at all?
 >>>
 >>>
 >>> Well after Tom's change both SCTP and FCoE use CHECKSUM_PARTIAL.
 >>> CHECKSUM_PARTIAL is what is used to indicate to the hardware that a
 >>> checksum offload has been requested so that is what is looked for at
 >>> the driver level.
 >>
 >>
 >> SCTP was actually already using CHECKSUM_PARTIAL. That patch was just a
 >> rename in an attempt to make this crc difference more evident. Yet I'll
 >> continue the rename within sctp code.
 >
 > Yeah it was FCoE that was doing something different.
 >
 >>> My concern with all this is that we should probably be looking at
 >>> coming up with a means of offloading this in software when
 >>> skb_checksum_help is called.  Right now validate_xmit_skb doesn't have
 >>> any understanding of what to do with SCTP or FCoE and will try to just
 >>> compute a checksum for them.
 >>
 >>
 >> My worry is placed a bit earlier than that, I think. Currently I 
just cannot
 >> do GSO with packets that doesn't have checksum/crc offloaded too because
 >> validate_xmit_skb() will complain.
 >
 > That is probably because you are passing CHECKSUM_PARTIAL instead of
 > CHECKSUM_NONE.

Other way around, but it's cool. We are pretty much on the same page 
now, I think.

 >> As NICs hardly have sctp crc offloading capabilities, I'm thinking 
it makes
 >> sense to do GSO even without crc offloaded. After all, it doesn't matter
 >> much in which stage we are computing the crc as we are computing it 
anyway.
 >
 > Agreed.  You will need to support CHECKSUM_PARTIAL being passed to a
 > device that doesn't support SCTP first.  That way you can start
 > looking at just always setting CHECKSUM_PARTIAL in the transport layer
 > which is really needed if you want to do SCO (SCTP Segmentation
 > Offload) in the first place.  Once you have that you could then start
 > looking at doing the SCO since from that point on you should already
 > be in good shape to address those type of issues.  You should probably
 > use the csum_offset value in the skb in order to flag if this is
 > possibly SCTP.  As far as I know for now there shouldn't be any other
 > protocols that are using the same offset, and if needed you can
 > actually parse the headers to verify if the frame is actually SCTP.

Cool, yes.
Just cannot set CHECKSUM_PARTIAL always because if frame is not a GSO, 
we will not have another chance to fill in SCTP CRC if it's not 
offloaded. A check still have to consider for that, but np.

 >>>>>> +static struct sk_buff *sctp_gso_segment(struct sk_buff *skb,
 >>>>>> +                                       netdev_features_t features)
 >>>>>> +{
 >>>>>> +       struct sk_buff *segs = ERR_PTR(-EINVAL);
 >>>>>> +       struct sctphdr *sh;
 >>>>>> +
 >>>>>> +       sh = sctp_hdr(skb);
 >>>>>> +       if (!pskb_may_pull(skb, sizeof(*sh)))
 >>>>>> +               goto out;
 >>>>>> +
 >>>>>> +       __skb_pull(skb, sizeof(*sh));
 >>>>>> +
 >>>>>> +       if (skb_gso_ok(skb, features | NETIF_F_GSO_ROBUST)) {
 >>>>>> +               /* Packet is from an untrusted source, reset 
gso_segs.
 >>>>>> */
 >>>>>> +               int type = skb_shinfo(skb)->gso_type;
 >>>>>> +
 >>>>>> +               if (unlikely(type &
 >>>>>> +                            ~(SKB_GSO_SCTP | SKB_GSO_DODGY |
 >>>>>> +                              0) ||
 >>>>>> +                            !(type & (SKB_GSO_SCTP))))
 >>>>>> +                       goto out;
 >>>>>> +
 >>>>>> +               /* This should not happen as no NIC has SCTP GSO
 >>>>>> +                * offloading, it's always via software and thus we
 >>>>>> +                * won't send a large packet down the stack.
 >>>>>> +                */
 >>>>>> +               WARN_ONCE(1, "SCTP segmentation offloading to 
NICs is
 >>>>>> not supported.");
 >>>>>> +               goto out;
 >>>>>> +       }
 >>>>>> +
 >>>>>
 >>>>>
 >>>>> So what you are going to end up needing here is some way to tell the
 >>>>> hardware that you are doing the checksum no matter what.  There is no
 >>>>> value in you computing a 1's compliment checksum for the payload if
 >>>>> you aren't going to use it.  What you can probably do is just clear
 >>>>> the standard checksum flags and then OR in NETIF_F_HW_CSUM if
 >>>>> NETIF_F_SCTP_CRC is set and that should get skb_segment to skip
 >>>>> offloading the checksum.
 >>>>
 >>>>
 >>>> Interesting, ok
 >>>>
 >>>>> One other bit that will make this more complicated is if we ever get
 >>>>> around to supporting SCTP in tunnels.  Then we will need to sort out
 >>>>> how things like remote checksum offload should impact SCTP, and 
how to
 >>>>> deal with needing to compute both a CRC and 1's compliment checksum.
 >>>>> What we would probably need to do is check for encap_hdr_csum and if
 >>>>> it is set and we are doing SCTP then we would need to clear the
 >>>>> NETIF_F_HW_CSUM, NETIF_F_IP_CSUM, and NETIF_F_IPV6_CSUM flags.
 >>>>
 >>>>
 >>>> Yup. And that includes on storing pointers to where to store each 
of it.
 >>>
 >>>
 >>> Actually the pointers bit is easy.  The csum_start and csum_offset
 >>> values should be set up after you have segmented the skb and should be
 >>> updated after the skb has been segmented.  If nothing else you can
 >>> probably take a look at the TCP code tcp_gso_segment and
 >>> tcp4_gso_segment for inspiration.  Basically you need to make sure
 >>> that you set the ip_summed, csum_start, and csum_offset values for
 >>> your first frame before you start segmenting it into multiple frames.
 >>
 >>
 >> Ah yes, ok, that's for now, when not doing crc offloading with some 
chksum
 >> offloading (tunnel) too.
 >
 > Actually that would be regardless of tunnel offloading.  We don't
 > store the outer checksum offsets.  If we need outer checksum we
 > restore them after the fact since the inner checksum offsets are
 > needed as part of the inner header TCP checksum computation.

Hm okay

 >>>>>> +       segs = skb_segment(skb, features);
 >>>>>> +       if (IS_ERR(segs))
 >>>>>> +               goto out;
 >>>>>> +
 >>>>>> +       /* All that is left is update SCTP CRC if necessary */
 >>>>>> +       for (skb = segs; skb; skb = skb->next) {
 >>>>>> +               if (skb->ip_summed != CHECKSUM_PARTIAL) {
 >>>>>> +                       sh = sctp_hdr(skb);
 >>>>>> +                       sh->checksum = sctp_gso_make_checksum(skb);
 >>>>>> +               }
 >>>>>> +       }
 >>>>>> +
 >>>>>
 >>>>>
 >>>>> Okay, so it looks like you are doing the right thing here and leaving
 >>>>> this as CHECKSUM_PARTIAL.
 >>>>
 >>>>
 >>>> Actually no then. sctp_gso_make_checksum() replaces it:
 >>>> +static __le32 sctp_gso_make_checksum(struct sk_buff *skb)
 >>>> +{
 >>>> +       skb->ip_summed = CHECKSUM_NONE;
 >>>> +       return sctp_compute_cksum(skb, skb_transport_offset(skb));
 >>>>
 >>>> Why again would have to leave it as CHECKSUM_PARTIAL? IP header?
 >>>
 >>>
 >>> My earlier comment is actually incorrect.  This section is pretty much
 >>> broken since CHECKSUM_PARTIAL only reflects a 1's compliment checksum
 >>> in the case of skb_segment so whatever the value it is worthless.
 >>> CHECKSUM_PARTIAL is used to indicate if a given frame needs to be
 >>> offloaded.  It is meant to let the device know that it still needs to
 >>> compute a checksum or CRC beginning at csum_start and then storing the
 >>> new value at csum_offset.  However for skb_segment it is actually
 >>> referring to a 1's compliment checksum and if it returns CHECKSUM_NONE
 >>> it means it is stored in skb->csum which would really wreck things for
 >>> you since that was your skb->csum_start and skb->csum_offset values.
 >>> I have a patch to change this so that we update a checksum in the
 >>> SKB_GSO_CB, but I wasn't planning on submitting that until net-next
 >>> opens.
 >>
 >>
 >> sctp currently ignores skb->csum. It doesn't mess with the crc but 
computing
 >> it is at least not optimal, yes.
 >
 > Actually sctp sets csum_start and csum_offset if it sets
 > CHECKSUM_PARTIAL.  So it does mess with skb->csum since it is
 > contained in a union with those two fields.

Well, yes, but point was that messed value is not used for anything 
useful later on..
I'll implement the NETIF_F_HW_CSUM trick.

 >>> In the case of SCTP you probably don't even need to bother checking
 >>> the value since it is meaningless as skb_segment doesn't know how to
 >>> do an SCTP checksum anyway.  To that end for now what you could do is
 >>> just set NETIF_F_HW_CSUM.  This way skb_segment won't go and try to
 >>> compute a 1's compliment checksum on the payload since there is no
 >>> actual need for it.
 >>
 >>
 >> Nice, ok.
 >>
 >>> One other bit you will need to do is to check the value of SCTP_CRC
 >>> outside of skb_segment.  You might look at how
 >>> __skb_udp_tunnel_segment does this to populate its own offload_csum
 >>> boolean value, though you would want to use features, not
 >>> skb->dev->features as that is a bit of a workaround since features is
 >>> stripped by hw_enc_features in some paths if I recall correctly.
 >>
 >>>
 >>>
 >>> Once the frames are segmented and if you don't support the offload you
 >>> could then call gso_make_crc32c() or whatever you want to name it to
 >>> perform the CRC calculation and populate the field.  One question by
 >>
 >>
 >> Hmmm.. does it mean that we can use CHECKSUM_PARTIAL then even if CRC
 >> offloading is not possible then? Because the packet will not be 
offloaded in
 >> the end, yes, but this solves my questions above. Then while doing 
GSO, it
 >> re-evaluates if it can offload crc or not?
 >
 > If you compute the CRC you set CHECKSUM_NONE, if you want the device
 > to do it on transmit you should set CHECKSUM_PARTIAL.

Okay

 >>> the way.  Don't you need to initialize the checksum value to 0 before
 >>> you compute it?  I think you might have missed that step when you were
 >>> setting this up.
 >>
 >>
 >> It's fine :) sctp_compute_cksum will replace it with zeroes, 
calculate, and
 >> put back the old value, which then we overwrite with the new one at
 >> sctp_gso_segment.
 >
 > Right but there are scenarios where this will be offloaded isn't
 > there?  You would probably be better off setting the CRC to 0 before
 > you start segmentation and then that way you can either just set
 > csum_offset, csum_start and ip_summed if the lower device supports
 > SCTP CRC offload, otherwise you can just compute it without the need
 > to write the 0 into the header.

Ahh, it's also zeroed when the header is constructed. There is 
'sh->checksum = 0;' in sctp_packet_transmit for this.

I'll look into moving this decision on CRC offloading or not into the 
segmentation moment. I think it will have to be done twice, actually, 
for sctp-reasons. Like, if packet will be fragmented by IP, it currently 
doesn't allow offloading CRC computing. I'll check, then post a v2. I 
think at that least the crc offloading is now clarified. Thanks Alex.

Marcelo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ