[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CALx6S36fFJSk0SpRvJraMKhArwisFe5+D51AQa9834O_MDzw2w@mail.gmail.com>
Date:	Tue, 22 Mar 2016 15:16:32 -0700
From:	Tom Herbert <tom@...bertland.com>
To:	Alexander Duyck <alexander.duyck@...il.com>
Cc:	David Miller <davem@...emloft.net>,
	Netdev <netdev@...r.kernel.org>,
	Kernel Team <kernel-team@...com>,
	Alex Duyck <aduyck@...antis.com>
Subject: Re: [PATCH net-next] net: Fix remote checksum offload with GUE
On Tue, Mar 22, 2016 at 3:09 PM, Tom Herbert <tom@...bertland.com> wrote:
> On Tue, Mar 22, 2016 at 2:45 PM, Alexander Duyck
> <alexander.duyck@...il.com> wrote:
>> On Tue, Mar 22, 2016 at 2:10 PM, Tom Herbert <tom@...bertland.com> wrote:
>>> On Tue, Mar 22, 2016 at 1:20 PM, Alexander Duyck
>>> <alexander.duyck@...il.com> wrote:
>>>> On Tue, Mar 22, 2016 at 12:19 PM, Tom Herbert <tom@...bertland.com> wrote:
>>>>> In skb_segment the check of whether or not to perform the checksum on
>>>>> host was changed to not consider rather remote checksum offload is
>>>>> in use. In the case that can_checksum_protocol fails the checksum
>>>>> is computed regardless. __skb_udp_tunnel_segment was modified in a
>>>>> related patch to add NETIF_F_HW_CSUM into features when grabbing
>>>>> the enc_features and remote checksum offload is being done. The
>>>>> problem is that this bit can be cleared in lower GSO layers that
>>>>> are also doing tunneling (e.g. ipip, GRE when used with GUE),
>>>>> so when we get to skb_segment that intent has been lost and
>>>>> can_checksum_protocol fails.
>>>>
>>>> So what you are describing sounds like a tunnel in tunnel scenario.
>>>> It might work better to just skip masking the features if
>>>> skb->remcsum_offload is set rather than trying to change how we
>>>> perform the offload.
>>>>
>>> To be clear, my patch is restoring the old behavior not implementing a new one.
>>
>> Yes, but the old behavior could lead to kernel panics under certain
>> circumstances.
>>
>>>> I'm pretty sure this will cause data corruption and maybe a kernel
>>>> panic if Tx checksum offload is disabled.
>>>>
>>> Nope, working fine for me.
>>
>> What are the options you used to create the tunnel?  Did you enable
>> both remcsum and udpcsum?
>>
> modprobe fou
> ./ip fou add port 6080 gue
> ./ip link add name tun1 type ipip remote 10.1.1.2 local 10.1.1.2 ttl
> 225 encap gue encap-sport auto encap-dport 6080 encap-csum
> encap-remcsum
> ifconfig tun1 192.168.1.1
> ip route add 192.168.1.0/24 dev tun1
>
Here are the command I use for VXLAN for reference:
./ip link add vxlan0 type vxlan id 10 group 224.10.10.10 ttl 10 dev
eth0 udpcsum remcsumtx remcsumrx
ifconfig vxlan0 192.168.111.1
ip route add 192.168.111.0/24 dev vxlan0
When remcsum offload is working properly (enabled and outer checksum
is offloaded) then csum_partial gets very few cycles as shown by perf.
Tom
> Thanks,
> Tom
>
>>> The outer checksum would be computed in
>>>>> This patch:
>>>>>
>>>>> - Restores the check in skb_segment to look at remote checksum offload.
>>>>> - Removes the code in __skb_udp_tunnel_segment to force the
>>>>>   NETIF_F_HW_CSUM feature since this is no longer useful with above
>>>>>   change.
>>>>> - Removes check for remote checksum offload in gso_reset_checksum.
>>>>>   A special case should not be needed here.
>>>>>
>>>>> Tested: Single netperf STREAM over GUE-ipip
>>>>>
>>>>> Before fix:
>>>>>    5625 Mbps
>>>>> After fix:
>>>>>    6410 Mbps
>>>>>
>>>>> Fixes: 76443456227097179c1482 ("net: Move GSO csum into SKB_GSO_CB")
>>>>> Signed-off-by: Tom Herbert <tom@...bertland.com>
>>>>> ---
>>>>>  include/linux/skbuff.h |  4 ----
>>>>>  net/core/skbuff.c      |  5 ++---
>>>>>  net/ipv4/udp_offload.c | 10 ----------
>>>>>  3 files changed, 2 insertions(+), 17 deletions(-)
>>>>>
>>>>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>>>>> index 15d0df9..f6fe8a5 100644
>>>>> --- a/include/linux/skbuff.h
>>>>> +++ b/include/linux/skbuff.h
>>>>> @@ -3615,10 +3615,6 @@ static inline int gso_pskb_expand_head(struct sk_buff *skb, int extra)
>>>>>
>>>>>  static inline void gso_reset_checksum(struct sk_buff *skb, __wsum res)
>>>>>  {
>>>>> -       /* Do not update partial checksums if remote checksum is enabled. */
>>>>> -       if (skb->remcsum_offload)
>>>>> -               return;
>>>>> -
>>>>>         SKB_GSO_CB(skb)->csum = res;
>>>>>         SKB_GSO_CB(skb)->csum_start = skb_checksum_start(skb) - skb->head;
>>>>>  }
>>>>
>>>> I'm pretty sure this part here will break things when you don't have
>>>> an outer offload enabled.  What NIC did you test this on?  Did you try
>>>> disabling the Tx checksum support in the hardware to see what would
>>>> happen?
>>>>
>>> Mellanox.
>>>
>>> When TX checksum is disabled the outer checksum is computed in
>>> __skb_udp_tunnel_segment.
>>
>> Yes, but that uses gso_make_checksum which requires the csum and
>> csum_start fields be initialized in the SKB_GSO_CB().
>>
>>>>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>>>>> index f044f97..e4eb78d 100644
>>>>> --- a/net/core/skbuff.c
>>>>> +++ b/net/core/skbuff.c
>>>>> @@ -3259,14 +3259,13 @@ skip_fraglist:
>>>>>                 nskb->truesize += nskb->data_len;
>>>>>
>>>>>  perform_csum_check:
>>>>> -               if (!csum) {
>>>>> +               if (!csum && !nskb->remcsum_offload) {
>>>>>                         if (skb_has_shared_frag(nskb)) {
>>>>>                                 err = __skb_linearize(nskb);
>>>>>                                 if (err)
>>>>>                                         goto err;
>>>>>                         }
>>>>> -                       if (!nskb->remcsum_offload)
>>>>> -                               nskb->ip_summed = CHECKSUM_NONE;
>>>>> +                       nskb->ip_summed = CHECKSUM_NONE;
>>>>>                         SKB_GSO_CB(nskb)->csum =
>>>>>                                 skb_checksum(nskb, doffset,
>>>>>                                              nskb->len - doffset, 0);
>>>>
>>>> I'm pretty sure this is going to cause a huge mess if you are
>>>> requesting remote checksum but cannot perform an outer checksum.  One
>>>> of the reasons I merged these features together the way I did was
>>>> because we needed to perform the initial checksum to avoid causing a
>>>> kernel panic later on.  Otherwise we don't have the SKB_GSO_CB()->csum
>>>> and SKB_GSO_CB()->csum_start fields populated.
>>>>
>>> Nope, if we can't perform outer checksum it is done in the host. I'm
>>> not seeing any problem.
>>
>> Are you running remcsum with udpcsum set or is udpcsum not set?  If
>> you don't set udpcsum it doesn't actually compute an outer checksum
>> value.
>>
>>>>> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
>>>>> index 8a3405a..f86f1e1 100644
>>>>> --- a/net/ipv4/udp_offload.c
>>>>> +++ b/net/ipv4/udp_offload.c
>>>>> @@ -78,16 +78,6 @@ static struct sk_buff *__skb_udp_tunnel_segment(struct sk_buff *skb,
>>>>>
>>>>>         features &= skb->dev->hw_enc_features;
>>>>>
>>>>> -       /* The only checksum offload we care about from here on out is the
>>>>> -        * outer one so strip the existing checksum feature flags and
>>>>> -        * instead set the flag based on our outer checksum offload value.
>>>>> -        */
>>>>> -       if (remcsum || ufo) {
>>>>> -               features &= ~NETIF_F_CSUM_MASK;
>>>>> -               if (!need_csum || offload_csum)
>>>>> -                       features |= NETIF_F_HW_CSUM;
>>>>> -       }
>>>>> -
>>>>>         /* segment inner packet. */
>>>>>         segs = gso_inner_segment(skb, features);
>>>>>         if (IS_ERR_OR_NULL(segs)) {
>>>>
>>>> This part breaks UDP fragmentation I am pretty sure.  We need this bit
>>>> for UFO to avoid having to perform a checksum over the data twice if
>>>> we are offloading the outer UDP checksum.
>>>
>>> The maybe leave this for ufo if it is a problem (please test your
>>> suppositions!), but this is not good for remcsum offload.
>>
>> I suspect we are supposed to be setting encap_level to 0 when we set
>> skb->encapsulation to 0 before processing inner headers in the case of
>> UDP and GRE tunnel offloads.  I'll try to test and submit something in
>> the next hour or two.
>>
>> - Alex
Powered by blists - more mailing lists
 
