[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <401ade79-f548-6e19-b12b-f2a22c5b525c@huawei.com>
Date: Wed, 20 Feb 2019 16:32:54 +0800
From: maowenan <maowenan@...wei.com>
To: Tom Herbert <tom@...bertland.com>
CC: Linux Kernel Network Developers <netdev@...r.kernel.org>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>
Subject: Re: [PATCH net-next] net: udp Allow CHECKSUM_UNNECESSARY packets to
do GRO.
On 2019/1/31 12:33, Tom Herbert wrote:
> On Wed, Jan 30, 2019 at 6:59 PM maowenan <maowenan@...wei.com> wrote:
>>
>>
>>
>> On 2019/1/31 10:43, Tom Herbert wrote:
>>> On Wed, Jan 30, 2019 at 5:58 PM maowenan <maowenan@...wei.com> wrote:
>>>>
>>>>
>>>>
>>>> On 2019/1/30 4:24, Tom Herbert wrote:
>>>>> On Tue, Jan 29, 2019 at 12:08 AM maowenan <maowenan@...wei.com> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2019/1/29 14:24, Tom Herbert wrote:
>>>>>>> On Mon, Jan 28, 2019 at 10:04 PM maowenan <maowenan@...wei.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2019/1/29 12:01, Tom Herbert wrote:
>>>>>>>>> On Mon, Jan 28, 2019 at 7:00 PM maowenan <maowenan@...wei.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>> Do you have any comments about this change?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 2019/1/23 11:33, Mao Wenan wrote:
>>>>>>>>>>> When udp4_gro_receive() get one packet that uh->check=0,
>>>>>>>>>>> skb_gro_checksum_validate_zero_check() will set the
>>>>>>>>>>> skb->ip_summed = CHECKSUM_UNNECESSARY;
>>>>>>>>>>> skb->csum_level = 0;
>>>>>>>>>>> Then udp_gro_receive() will flush the packet which is not CHECKSUM_PARTIAL,
>>>>>>>>>>> It is not our expect, because check=0 in udp header indicates this
>>>>>>>>>>> packet is no need to caculate checksum, we should go further to do GRO.
>>>>>>>>>>>
>>>>>>>>>>> This patch changes the value of csum_cnt according to skb->csum_level.
>>>>>>>>>>> ---
>>>>>>>>>>> include/linux/netdevice.h | 1 +
>>>>>>>>>>> 1 file changed, 1 insertion(+)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>>>>>>>>>> index 1377d08..9c819f1 100644
>>>>>>>>>>> --- a/include/linux/netdevice.h
>>>>>>>>>>> +++ b/include/linux/netdevice.h
>>>>>>>>>>> @@ -2764,6 +2764,7 @@ static inline void skb_gro_incr_csum_unnecessary(struct sk_buff *skb)
>>>>>>>>>>> * during GRO. This saves work if we fallback to normal path.
>>>>>>>>>>> */
>>>>>>>>>>> __skb_incr_checksum_unnecessary(skb);
>>>>>>>>>>> + NAPI_GRO_CB(skb)->csum_cnt = skb->csum_level + 1;
>>>>>>>>>
>>>>>>>>> That doesn't look right. This would be reinitializing the GRO
>>>>>>>>> checksums from the beginning.
>>>>>>>>>
>>>>>>>>>>> }
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> I assume the code is bailing on this conditional:
>>>>>>>>>
>>>>>>>>> if (NAPI_GRO_CB(skb)->encap_mark ||
>>>>>>>>> (skb->ip_summed != CHECKSUM_PARTIAL &&
>>>>>>>>> NAPI_GRO_CB(skb)->csum_cnt == 0 &&
>>>>>>>>> !NAPI_GRO_CB(skb)->csum_valid) ||
>>>>>>>>> !udp_sk(sk)->gro_receive)
>>>>>>>>> goto out_unlock;
>>>>>>>>>
>>>>>>>>> I am trying to remember why this needs to check csum_cnt. If there was
>>>>>>>>> a csum_cnt for the UDP csum being zero from checksum-unnecessary, it
>>>>>>>>> was consumed by skb_gro_checksum_validate_zero_check in UDP4 GRO
>>>>>>>>> received.
>>>>>>>>
>>>>>>>> We have met the scene about two VMs in different host with vxlan packets, when udp4_gro_receive receives
>>>>>>>> one packet with ip_summed=CHECKSUM_NONE,csum_cnt=0,csum_valid=0,and udp->check=0, then skb_gro_checksum_validate_zero_check()->
>>>>>>>> skb_gro_incr_csum_unnecessary() validate it and set ip_summed=CHECKSUM_UNNECESSARY,csum_level=0, but csum_cnt and csum_valid
>>>>>>>> keep zero value. Then it will be flushed in udp_gro_receive(), the codes as you have showed.
>>>>>>>>
>>>>>>>> so I think it forgets to modify csum_cnt since csum_level is changed in skb_gro_incr_csum_unnecessary()->__skb_incr_checksum_unnecessary().
>>>>>>>>
>>>>>>> Yes, but the csum_level is changing since we've gone beyond the
>>>>>>> checksums initially reported inc checksum-unnecessary. GRO csum_cnt is
>>>>>>> initialized to skb->csum_level + 1 at the start of GRO processing.
>>>>>>>
>>>>>>> If I recall, the rule is that UDP GRO requires at least one non-zero
>>>>>>> checksum to be verified. The idea is that if we end up computing
>>>>>>> packet checksums on the host for inner checksums like TCP during GRO,
>>>>>>> then that's negating the performance benefits of GRO. Had UDP check
>>>>>>> not been zero then we would do checksum unnecessary conversion and so
>>>>>>> csum_valid would be set for the remainded of GRO processing. The
>>>>>>> existing code is following the rule I believe, so this may be working
>>>>>>> as intended.
>>>>>>
>>>>>> Do you have any suggestion if I need do GRO as udp->check is zero?
>>>>>> My previous modification which works fine as below:
>>>>>> if (NAPI_GRO_CB(skb)->encap_mark ||
>>>>>> (skb->ip_summed != CHECKSUM_PARTIAL &&
>>>>>> + skb->ip_summed != CHECKSUM_UNNECESSARY &&
>>>>>
>>>>> That's effectively disabling the rule that we need a real checksum
>>>>> calculation to proceed with GRO. Besides that, the device returning
>>>>> one checksum-unnecessary level because UDP csum is zero is pretty
>>>>> pointelss; we can just as easily deduce get to same state just by
>>>>> looking at the field with CHECKSUM_NONE. What we really want to see
>>>>> for GRO is a real checksum computation being done on the packet.
>>>>>
>>>>> A few questions:
>>>>>
>>>>> What type of packets are being GROed? Are these TCP? What performance
>>>>> difference do you see with our patch? Can you try enabling UDP
>>>>> checksums, and even RCO with VXLAN? With UDP encapsulation we
>>>>> generally see better performance with checksum enabled since UDP
>>>>> checksum offload is ubiquitous and we can easily convert
>>>>> checksum-unnecessary (with non-zero csum) to checksum-complete.
>>>>
>>>> We use the physical network card calculate the checksum of the inner packet with checksum offload.
>>>> Set the udp checksum of the vxlan header is 0.
>>>>
>>> I see. It sounds like the device is really verifying two checksums in
>>> the packet, the outer UDP checksum (which is zero for UDP) and an
>>> inner checksum, but only reporting one checksum was verified. The
>>> driver needs to set csum_level to 1 in this case (meaning two
>>> checksums have been verified for checksum-unnecessary). What NIC are
>>> you using?
>>
>> Currently it is 82599, whose driver can't recognize the vxlan packet.
>> I guess so many NICs can't do this checking in it's driver, so I think this is a
>> common case, will we fix it in stack?
>>
> Mao,
>
> The problem isn't in the stack, it seems to be in the driver. If the
> device reports a verified checksum for an encpasulated packet then the
> driver needs to set csum_level to 1. Otherwise, the stack can't just
> assume that the inner checksum was verified. *How* a driver deduces
> that the device is reporting about an encapsulated checksum is
> specific to the device and its driver. I'm not sure which driver your
> running, but if you search the code there should be something like
> "skb->csum_level =1" that would be a clue about support. A good
> example is ixgbe, if the device reports checksum verified and that
> packet was VXLAN, it deduces that the inner checksum was verified and
> so the driver sets CHECKSUM_UNNECESSARY and skb->csum_level. Of course
> all this complexity goes away when devices just provide
> checksum-complete.
>
> Tom
Hi Tom,
I have checked code in ixgbe_main.c ixgbe_rx_checksum(). UDP frames with a 0 checksum can be marked as
checksum errors. If it returns here, skb->ip_summed will be set to CHECKSUM_NONE.
Then the packet will be flush to stack in udp_gro_receive().
Do you think whether it is the fault of driver or not?
ixgbe_rx_checksum():
if (ixgbe_test_staterr(rx_desc, IXGBE_RXDADV_ERR_TCPE)) {
/*
* 82599 errata, UDP frames with a 0 checksum can be marked as
* checksum errors.
*/
if ((pkt_info & cpu_to_le16(IXGBE_RXDADV_PKTTYPE_UDP)) &&
test_bit(__IXGBE_RX_CSUM_UDP_ZERO_ERR, &ring->state))
return;
ring->rx_stats.csum_err++;
return;
}
udp_gro_receive():
if (NAPI_GRO_CB(skb)->encap_mark ||
(skb->ip_summed != CHECKSUM_PARTIAL &&
NAPI_GRO_CB(skb)->csum_cnt == 0 &&
!NAPI_GRO_CB(skb)->csum_valid) ||
!udp_sk(sk)->gro_receive)
goto out_unlock;
>
>>>
>>> Tom
>>>
>>>
>>>> With this patch, the bandwidth of TCP between two VMs increase from 2Gbit/s to 6Gbit/s.
>>>>
>>>>>
>>>>> Tom
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> NAPI_GRO_CB(skb)->csum_cnt == 0 &&
>>>>>> !NAPI_GRO_CB(skb)->csum_valid) ||
>>>>>> !udp_sk(sk)->gro_receive)
>>>>>> goto out_unlock;
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Tom
>>>>>>>
>>>>>>>>>
>>>>>>>>> .
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> .
>>>>>>>
>>>>>>
>>>>>
>>>>> .
>>>>>
>>>>
>>>
>>> .
>>>
>>
>
> .
>
Powered by blists - more mailing lists