[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKgT0Uczi6icYfQnuYYzA3_keXVTjJ2GEKfBabPbwFtcd=nddg@mail.gmail.com>
Date: Fri, 22 Dec 2017 08:32:16 -0800
From: Alexander Duyck <alexander.duyck@...il.com>
To: Yunsheng Lin <linyunsheng@...wei.com>
Cc: "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"davem@...emloft.net" <davem@...emloft.net>,
"linuxarm@...wei.com" <linuxarm@...wei.com>,
yuxiaowu@...ilicon.com, wzhen.wang@...ilicon.com,
Xuehuahu <xuehuahu@...ilicon.com>
Subject: Re: [QUESTION] Doubt about NAPI_GRO_CB(skb)->is_atomic in tcpv4 gro process
On Fri, Dec 22, 2017 at 12:49 AM, Yunsheng Lin <linyunsheng@...wei.com> wrote:
> Hi, Alexander
>
> On 2017/12/22 0:29, Alexander Duyck wrote:
>> On Thu, Dec 21, 2017 at 1:16 AM, Yunsheng Lin <linyunsheng@...wei.com> wrote:
>>> Hi, Alexander
>>>
>>> On 2017/12/21 0:24, Alexander Duyck wrote:
>>>> On Wed, Dec 20, 2017 at 1:09 AM, Yunsheng Lin <linyunsheng@...wei.com> wrote:
>>>>> Hi, all
>>>>> I have some doubt about NAPI_GRO_CB(skb)->is_atomic when
>>>>> analyzing the tcpv4 gro process:
>>>>>
>>>>> Firstly we set NAPI_GRO_CB(skb)->is_atomic to 1 in dev_gro_receive:
>>>>> https://elixir.free-electrons.com/linux/v4.15-rc4/source/net/core/dev.c#L4838
>>>>>
>>>>> And then in inet_gro_receive, we check the NAPI_GRO_CB(skb)->is_atomic
>>>>> before setting NAPI_GRO_CB(skb)->is_atomic according to IP_DF bit in the ip header:
>>>>> https://elixir.free-electrons.com/linux/v4.15-rc4/source/net/ipv4/af_inet.c#L1319
>>>>>
>>>>> struct sk_buff **inet_gro_receive(struct sk_buff **head, struct sk_buff *skb)
>>>>> {
>>>>> .....................
>>>>> for (p = *head; p; p = p->next) {
>>>>> ........................
>>>>>
>>>>> /* If the previous IP ID value was based on an atomic
>>>>> * datagram we can overwrite the value and ignore it.
>>>>> */
>>>>> if (NAPI_GRO_CB(skb)->is_atomic) //we check it here
>>>>> NAPI_GRO_CB(p)->flush_id = flush_id;
>>>>> else
>>>>> NAPI_GRO_CB(p)->flush_id |= flush_id;
>>>>> }
>>>>>
>>>>> NAPI_GRO_CB(skb)->is_atomic = !!(iph->frag_off & htons(IP_DF)); //we set it here
>>>>> NAPI_GRO_CB(skb)->flush |= flush;
>>>>> skb_set_network_header(skb, off);
>>>>> ................................
>>>>> }
>>>>>
>>>>> My question is whether we should check the NAPI_GRO_CB(skb)->is_atomic or NAPI_GRO_CB(p)->is_atomic?
>>>>> If we should check NAPI_GRO_CB(skb)->is_atomic, then maybe it is unnecessary because it is alway true.
>>>>> If we should check NAPI_GRO_CB(p)->is_atomic, maybe there is a bug here.
>>>>>
>>>>> So what is the logic here? I am just start analyzing the gro, maybe I miss something obvious here.
>>>>
>>>> The logic there is to address the multiple IP header case where there
>>>> are 2 or more IP headers due to things like VXLAN or GRE tunnels. So
>>>> what will happen is that an outer IP header will end up being sent
>>>> with DF not set and will clear the is_atomic value then we want to OR
>>>> in the next header that is applied. It defaults to assignment on
>>>> is_atomic because the first IP header will encounter flush_id with no
>>>> previous configuration occupying it.
>>>
>>> I see your point now.
>>>
>>> But for the same flow of tunnels packet, the outer and inner ip header must
>>> have the same fixed id or increment id?
>>>
>>> For example, if we have a flow of tunnels packet which has fixed id in outer
>>> header and increment id in inner header(the inner header does have DF flag set):
>
> Sorry, a typo error here. I meant the inner header does *not* have DF flag set here.
>
>>>
>>> 1. For the first packet, NAPI_GRO_CB(skb)->is_atomic will be set to zero when
>>> inet_gro_receive is processing the inner ip header.
>>>
>>> 2. For the second packet, when inet_gro_receive is processing the outer ip header
>>> which has a fixed id, NAPI_GRO_CB(p)->is_atomic is zero according to [1], so
>>> NAPI_GRO_CB(p)->flush_id will be set to 0xFFFF, then the second packet will not
>>> be merged to first packet in tcp_gro_receive.
>>
>> I'm not sure how valid your case here is. The is_atomic is only really
>> meant to apply to the inner-most header.
>
> For the new skb, NAPI_GRO_CB(skb)->is_atomic is indeed applied to the
> inner-most header.
>
> What about the NAPI_GRO_CB(p)->is_atomic, p is the same skb flow already
> merged by gro.
>
> Let me try if I understand it correctly:
> when there is only one skb merged in p, then NAPI_GRO_CB(p)->is_atomic
> is set according to the first skb' inner-most ip DF flags.
>
> When the second skb comes, and inet_gro_receive is processing the
> outer-most ip, for the below code, NAPI_GRO_CB(p)->is_atomic
> is for first skb's inner-most ip DF flags, and "iph->frag_off & htons(IP_DF)"
> is for second skb' outer-most ip DF flags.
> Why don't we use the first skb's outer-most ip DF flags here? I think it is
> more logical to use first skb's outer-most ip DF flags here. But the first
> skb's outer-most ip DF flags is lost when we get here, right?
>
> if (!NAPI_GRO_CB(p)->is_atomic ||
> !(iph->frag_off & htons(IP_DF))) {
> flush_id ^= NAPI_GRO_CB(p)->count;
> flush_id = flush_id ? 0xFFFF : 0;
> }
We already did in a way. If you look earlier up we will set flush if
the DF bit of this packet and the next don't match. So if the DF bit
changes we are flushing anyway.
>
>
> In the case of TCP the
>> inner-most header should almost always have the DF bit set which means
>> the inner-most is almost always atomic.
>>
>>> I thought outer ip header could have a fixed id while inner ip header could
>>> have a increment id. Do I miss something here?
>>
>> You have it backwards. The innermost will have DF bit set so it can be
>> fixed, the outer-most will in many cases not since it is usually UDP
>> and as such it will likely need to increment.
>
> Actually I got it backwards because that is what intel do when TSOing.
>
> Below is quoted from 710 datasheet in page 1052:
> The Identification field (in the IP header in non tunneled packets or the inner IP header in
> tunneled packets) is taken from the TSO header in the first segment and then it is increased by
> one for each transmitted segment.
> The Identification field (in the external IP header) is taken from the TSO header in the first
> segment and then it is either increased by one for each transmitted segment or remains
> constant depending on the EIP_NOINC setting in the transmit context descriptor.
>
> If I understand it correctly, intel is doing oppositely as pointed out by you.
> Am I miss something here?
>
> Refer:
> https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xl710-10-40-controller-datasheet.pdf
The only form of IP ID mangling supported by i40e is to increment
everything always for outer and inner header. This code was meant more
to handle the setup generated by igb or ixgbe when we are doing
GSO_PARTIAL. We don't support fixed outer IDs in that driver if I
recall correctly. The hardware might support it but the stack doesn't.
Basically if DF is not set then we must increment and if DF is set we
could either increment or be fixed and we would be within spec for how
IP IDs are supposed to be handled.
>>
>>>>
>>>> The part I am not sure about is if we should be using assignment for
>>>> is_atomic or using an "&=" to clear the bit and leave it cleared.
>>>
>>> I am not sure I understood you here. is_atomic is a bit field, why do you
>>> want to use "&="?
>>
>> Actually that was my mind kind of wandering. It has been a while since
>> I looked at this code and the use of &= wouldn't be appropriate since
>> is_atomic should only apply to the innermost header.
>>
>> Basically the only acceptable combinations for is_atomic and flush_id
>> are false with 0, or true with 1. We can't have a fixed outer header
>> value if DF is not set.
>>
>> Hope that helps to clarify things.
>
> Yes, Your reqly has helped a lot. But I still have some doubt above,
> please help me understand it.
> I hope this does not take much of your time.
> Thanks very much.
>
> Yunsheng Lin
>
You are free to doubt, but I suggest you actually walk through the
protocol and thoroughly read the code. Arguing hypothetical bugs is
rather tedious when I can easily point out the flaws and why something
isn't a bug.
- Alex
Powered by blists - more mailing lists