netdev - Re: [QUESTION] Doubt about NAPI_GRO_CB(skb)->is

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Mon, 25 Dec 2017 17:32:18 +0800
From:   Yunsheng Lin <linyunsheng@...wei.com>
To:     Alexander Duyck <alexander.duyck@...il.com>
CC:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "davem@...emloft.net" <davem@...emloft.net>,
        "linuxarm@...wei.com" <linuxarm@...wei.com>,
        <yuxiaowu@...ilicon.com>, <wzhen.wang@...ilicon.com>,
        Xuehuahu <xuehuahu@...ilicon.com>
Subject: Re: [QUESTION] Doubt about NAPI_GRO_CB(skb)->is_atomic in tcpv4 gro
 process

Hi, Alexander

On 2017/12/23 0:32, Alexander Duyck wrote:
> On Fri, Dec 22, 2017 at 12:49 AM, Yunsheng Lin <linyunsheng@...wei.com> wrote:
>> Hi, Alexander
>>
>> On 2017/12/22 0:29, Alexander Duyck wrote:
>>> On Thu, Dec 21, 2017 at 1:16 AM, Yunsheng Lin <linyunsheng@...wei.com> wrote:
>>>> Hi, Alexander
>>>>
>>>> On 2017/12/21 0:24, Alexander Duyck wrote:
>>>>> On Wed, Dec 20, 2017 at 1:09 AM, Yunsheng Lin <linyunsheng@...wei.com> wrote:
>>>>>> Hi, all
>>>>>>         I have some doubt about NAPI_GRO_CB(skb)->is_atomic when
>>>>>> analyzing the tcpv4 gro process:
>>>>>>
>>>>>> Firstly we set NAPI_GRO_CB(skb)->is_atomic to 1 in dev_gro_receive:
>>>>>> https://elixir.free-electrons.com/linux/v4.15-rc4/source/net/core/dev.c#L4838
>>>>>>
>>>>>> And then in inet_gro_receive, we check the NAPI_GRO_CB(skb)->is_atomic
>>>>>> before setting NAPI_GRO_CB(skb)->is_atomic according to IP_DF bit in the ip header:
>>>>>> https://elixir.free-electrons.com/linux/v4.15-rc4/source/net/ipv4/af_inet.c#L1319
>>>>>>
>>>>>> struct sk_buff **inet_gro_receive(struct sk_buff **head, struct sk_buff *skb)
>>>>>> {
>>>>>> .....................
>>>>>>         for (p = *head; p; p = p->next) {
>>>>>> ........................
>>>>>>
>>>>>>                 /* If the previous IP ID value was based on an atomic
>>>>>>                  * datagram we can overwrite the value and ignore it.
>>>>>>                  */
>>>>>>                 if (NAPI_GRO_CB(skb)->is_atomic)                      //we check it here
>>>>>>                         NAPI_GRO_CB(p)->flush_id = flush_id;
>>>>>>                 else
>>>>>>                         NAPI_GRO_CB(p)->flush_id |= flush_id;
>>>>>>         }
>>>>>>
>>>>>>         NAPI_GRO_CB(skb)->is_atomic = !!(iph->frag_off & htons(IP_DF));  //we set it here
>>>>>>         NAPI_GRO_CB(skb)->flush |= flush;
>>>>>>         skb_set_network_header(skb, off);
>>>>>> ................................
>>>>>> }
>>>>>>
>>>>>> My question is whether we should check the NAPI_GRO_CB(skb)->is_atomic or NAPI_GRO_CB(p)->is_atomic?
>>>>>> If we should check NAPI_GRO_CB(skb)->is_atomic, then maybe it is unnecessary because it is alway true.
>>>>>> If we should check NAPI_GRO_CB(p)->is_atomic, maybe there is a bug here.
>>>>>>
>>>>>> So what is the logic here? I am just start analyzing the gro, maybe I miss something obvious here.
>>>>>
>>>>> The logic there is to address the multiple IP header case where there
>>>>> are 2 or more IP headers due to things like VXLAN or GRE tunnels. So
>>>>> what will happen is that an outer IP header will end up being sent
>>>>> with DF not set and will clear the is_atomic value then we want to OR
>>>>> in the next header that is applied. It defaults to assignment on
>>>>> is_atomic because the first IP header will encounter flush_id with no
>>>>> previous configuration occupying it.
>>>>
>>>> I see your point now.
>>>>
>>>> But for the same flow of tunnels packet, the outer and inner ip header must
>>>> have the same fixed id or increment id?
>>>>
>>>> For example, if we have a flow of tunnels packet which has fixed id in outer
>>>> header and increment id in inner header(the inner header does have DF flag set):
>>
>> Sorry, a typo error here. I meant the inner header does *not* have DF flag set here.
>>
>>>>
>>>> 1. For the first packet, NAPI_GRO_CB(skb)->is_atomic will be set to zero when
>>>> inet_gro_receive is processing the inner ip header.
>>>>
>>>> 2. For the second packet, when inet_gro_receive is processing the outer ip header
>>>> which has a fixed id, NAPI_GRO_CB(p)->is_atomic is zero according to [1], so
>>>> NAPI_GRO_CB(p)->flush_id will be set to 0xFFFF, then the second packet will not
>>>> be merged to first packet in tcp_gro_receive.
>>>
>>> I'm not sure how valid your case here is. The is_atomic is only really
>>> meant to apply to the inner-most header.
>>
>> For the new skb, NAPI_GRO_CB(skb)->is_atomic is indeed applied to the
>> inner-most header.
>>
>> What about the NAPI_GRO_CB(p)->is_atomic, p is the same skb flow already
>> merged by gro.
>>
>> Let me try if I understand it correctly:
>> when there is only one skb merged in p, then NAPI_GRO_CB(p)->is_atomic
>> is set according to the first skb' inner-most ip DF flags.
>>
>> When the second skb comes, and inet_gro_receive is processing the
>> outer-most ip, for the below code, NAPI_GRO_CB(p)->is_atomic
>> is for first skb's inner-most ip DF flags, and "iph->frag_off & htons(IP_DF)"
>> is for second skb' outer-most ip DF flags.
>> Why don't we use the first skb's outer-most ip DF flags here? I think it is
>> more logical to use first skb's outer-most ip DF flags here. But the first
>> skb's outer-most ip DF flags is lost when we get here, right?
>>
>>                 if (!NAPI_GRO_CB(p)->is_atomic ||
>>                     !(iph->frag_off & htons(IP_DF))) {
>>                         flush_id ^= NAPI_GRO_CB(p)->count;
>>                         flush_id = flush_id ? 0xFFFF : 0;
>>                 }
> 
> We already did in a way. If you look earlier up we will set flush if
> the DF bit of this packet and the next don't match. So if the DF bit
> changes we are flushing anyway.

Hmm, sorry for missing that.

What puzzle me most is the differentiation between NAPI_GRO_CB(p)->is_atomic
and NAPI_GRO_CB(skb)->is_atomic. So I spent some time on reading the code
related to them. Here is what I found:

Suppose there are only two ip headers and focus on the DF flag and ID field
in ip header(assume other conditions are meet). Because when the same flow
of second skb is being processed, the NAPI_GRO_CB(p)->is_atomic set to 0 or 1
accoding to the first skb' inner-most ip DF flag, So I analyze it based on
the first skb' inner-most ip DF flag.

When the first skb' inner-most ip DF flag is not set, then
NAPI_GRO_CB(p)->is_atomic is always zero. When the skb's DF flags and ID field
meet the follow condition, it will be merged:

outer-DF    outer-ID  inner-DF  inner-ID
    1          any       0      increment
    0       increment    0      increment


When the first skb' inner-most ip DF flag is set, NAPI_GRO_CB(p)->is_atomic is 1
at first and will be changed accordingly when processing the second skb.
When the skb's DF flags and ID field meet the follow condition, it will be merged
and NAPI_GRO_CB(p)->is_atomic is changed accordingly:

outer-DF   outer-ID   inner-DF  inner-ID
  1          any         1       fixed
  1          any         1      increment  [NAPI_GRO_CB(p)->is_atomic changes to zero]
  0        increment     1      increment  [NAPI_GRO_CB(p)->is_atomic changes to zero]
  0        increment     1       fixed

Is my above analysis correct? I have tried my best to understand it. :)

According to the analysis above, If the DF flags in outer-most ip header is set,
then ID can be any value. But not the same as the ID in the inner-most ip header,
it can only be fixed or increment. According to rcf6864 originating sources MAY
set the IPv4 ID field of atomic datagrams to any value if DF is set.
I wonder what is the different here?

Also I wonder if gro support more than two ip header now.


Refer:
https://tools.ietf.org/html/rfc6864#section-4.1

> 
>>
>>
>>  In the case of TCP the
>>> inner-most header should almost always have the DF bit set which means
>>> the inner-most is almost always atomic.
>>>
>>>> I thought outer ip header could have a fixed id while inner ip header could
>>>> have a increment id. Do I miss something here?
>>>
>>> You have it backwards. The innermost will have DF bit set so it can be
>>> fixed, the outer-most will in many cases not since it is usually UDP
>>> and as such it will likely need to increment.
>>
>> Actually I got it backwards because that is what intel do when TSOing.
>>
>> Below is quoted from 710 datasheet in page 1052:
>> The Identification field (in the IP header in non tunneled packets or the inner IP header in
>> tunneled packets) is taken from the TSO header in the first segment and then it is increased by
>> one for each transmitted segment.
>> The Identification field (in the external IP header) is taken from the TSO header in the first
>> segment and then it is either increased by one for each transmitted segment or remains
>> constant depending on the EIP_NOINC setting in the transmit context descriptor.
>>
>> If I understand it correctly, intel is doing oppositely as pointed out by you.
>> Am I miss something here?
>>
>> Refer:
>> https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xl710-10-40-controller-datasheet.pdf
> 
> The only form of IP ID mangling supported by i40e is to increment
> everything always for outer and inner header. This code was meant more
> to handle the setup generated by igb or ixgbe when we are doing
> GSO_PARTIAL. We don't support fixed outer IDs in that driver if I
> recall correctly. The hardware might support it but the stack doesn't.
> Basically if DF is not set then we must increment and if DF is set we
> could either increment or be fixed and we would be within spec for how
> IP IDs are supposed to be handled.>
>>>
>>>>>
>>>>> The part I am not sure about is if we should be using assignment for
>>>>> is_atomic or using an "&=" to clear the bit and leave it cleared.
>>>>
>>>> I am not sure I understood you here. is_atomic is a bit field, why do you
>>>> want to use "&="?
>>>
>>> Actually that was my mind kind of wandering. It has been a while since
>>> I looked at this code and the use of &= wouldn't be appropriate since
>>> is_atomic should only apply to the innermost header.
>>>
>>> Basically the only acceptable combinations for is_atomic and flush_id
>>> are false with 0, or true with 1. We can't have a fixed outer header
>>> value if DF is not set.
>>>
>>> Hope that helps to clarify things.
>>
>> Yes, Your reqly has helped a lot. But I still have some doubt above,
>> please help me understand it.
>> I hope this does not take much of your time.
>> Thanks very much.
>>
>> Yunsheng Lin
>>
> 
> You are free to doubt, but I suggest you actually walk through the
> protocol and thoroughly read the code. Arguing hypothetical bugs is
> rather tedious when I can easily point out the flaws and why something
> isn't a bug.

I will try my best to do as you suggested.
Thanks very much for your time.

Best Regards
Yunsheng Lin

> 
> - Alex
> 
> .
>