netdev - Re: [PATCH v4 net-next 00/19] inet: frags: bring rhashtables to IP defrag

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <11b2baca-c810-3f61-38d1-415099783129@mellanox.com>
Date:   Wed, 30 May 2018 10:20:42 +0300
From:   Tariq Toukan <tariqt@...lanox.com>
To:     Eric Dumazet <eric.dumazet@...il.com>,
        Alexander Aring <aring@...atatu.com>,
        Tariq Toukan <tariqt@...lanox.com>,
        Moshe Shemesh <moshe@...lanox.com>
Cc:     David Miller <davem@...emloft.net>, edumazet@...gle.com,
        netdev@...r.kernel.org, fw@...len.de, herbert@...dor.apana.org.au,
        tgraf@...g.ch, brouer@...hat.com, alex.aring@...il.com,
        stefan@....samsung.com, ktkhai@...tuozzo.com,
        Eran Ben Elisha <eranbe@...lanox.com>
Subject: Re: [PATCH v4 net-next 00/19] inet: frags: bring rhashtables to IP
 defrag



On 28/05/2018 7:09 PM, Eric Dumazet wrote:
> 
> 
> On 05/28/2018 07:52 AM, Alexander Aring wrote:
> 
>> as somebody who had similar issues with this patch series I can tell you
>> about what happened for the 6LoWPAN fragmentation.
>>
>> The issue sounds similar, but there is too much missing information here
>> to say something about if you have exactly the issue which we had.
>>
>> Our problem:
>>
>> The patch series uses memcmp() to compare hash keys, we had some padding
>> bytes in our hash key and it occurs that we had sometimes random bytes
>> in this structure when it's put on stack. We solved it by a struct
>> foo_key bar = {}, which in case of gcc it _seems_ it makes a whole
>> memset(bar, 0, ..) on the structure.
>>
>> I asked on the netdev mailinglist how to deal with this problem in
>> general, because = {} works in case of gcc, others compilers may have a
>> different handling or even gcc will changes this behaviour in future.
>> I got no reply so I did what it works for me. :-)
>>
>> At least maybe a memcmp() on structures should never be used, it should
>> be compared by field. I would recommend this way when the compiler is
>> always clever enough to optimize it in some cases, but I am not so a
>> compiler expert to say anything about that.
>>
>> I checked the hash key structures for x86_64 and pahole, so far I didn't
>> find any padding bytes there, but it might be different on
>> architectures or ?compiler?.
>>
>> Additional useful information to check if you running into the same problem
>> would be:
>>
>>   - Which architecture do you use?
>>
>>   - Do you have similar problems with a veth setup?
>>
>> You could also try this:
>>
>> diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
>> index b939b94e7e91..40ece9ab8b12 100644
>> --- a/net/ipv6/reassembly.c
>> +++ b/net/ipv6/reassembly.c
>> @@ -142,19 +142,19 @@ static void ip6_frag_expire(struct timer_list *t)
>>   static struct frag_queue *
>>   fq_find(struct net *net, __be32 id, const struct ipv6hdr *hdr, int iif)
>>   {
>> -       struct frag_v6_compare_key key = {
>> -               .id = id,
>> -               .saddr = hdr->saddr,
>> -               .daddr = hdr->daddr,
>> -               .user = IP6_DEFRAG_LOCAL_DELIVER,
>> -               .iif = iif,
>> -       };
>> +       struct frag_v6_compare_key key = {};
>>          struct inet_frag_queue *q;
>>   
>>          if (!(ipv6_addr_type(&hdr->daddr) & (IPV6_ADDR_MULTICAST |
>>                                              IPV6_ADDR_LINKLOCAL)))
>>                  key.iif = 0;
>>   
>> +       key.id = id;
>> +       key.saddr = hdr->saddr;
>> +       key.daddr = hdr->daddr;
>> +       key.user = IP6_DEFRAG_LOCAL_DELIVER;
>> +       key.iif = iif;
>> +
>>          q = inet_frag_find(&net->ipv6.frags, &key);
>>          if (!q)
>>                  return NULL;
>>
>> - Alex
>>
> 
> Hi Alex.
> 
> This patch makes no sense, since struct frag_v6_compare_key has no hole.
> 
> Only 6LoWPAN had a problem really, because of its way of having unions (and holes).
> 
> Also note that your patch would break the case when we force key.iif to be zero.
> 
> 
> Tariq, here are my test results : No drops for me.
> 
> # ./netperf -H 2607:f8b0:8099:e18:: -t UDP_STREAM
> MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to 2607:f8b0:8099:e18:: () port 0 AF_INET6
> Socket  Message  Elapsed      Messages
> Size    Size     Time         Okay Errors   Throughput
> bytes   bytes    secs            #      #   10^6bits/sec
> 
> 212992   65507   10.00      202117      0    10592.00
> 212992           10.00           0              0.00
> 
> Somehow, you might send packets too fast and receiver has a problem with that ?

Not sure, the transmit BW you get is higher than what we saw.
Anyway, we'll check this.

> For particular needs, you might need to adjust :
> 
> /proc/sys/net/ipv6/ip6frag_time  (to 2 seconds instead of the default of 60)
> /proc/sys/net/ipv6/ip6frag_low_thresh
> /proc/sys/net/ipv6/ip6frag_high_thresh
> 
> Once your receiver has filled its capacity with frags, the default of 60 seconds to garbage collect
> might be the reason you notice a problem.
> 
> Check :
> grep FRAG6 /proc/net/sockstat6
> 
> On Google servers we multiply by 25 the limits for ipv6 frags memory usage :
> 
> /proc/sys/net/ipv6/ip6frag_high_thresh:104857600  (instead of 4MB)
> /proc/sys/net/ipv6/ip6frag_low_thresh:78643200  (instead of 3 MB)
> 
> When using 64KB datagrams, note that the truesize of the datagram would be about 44 * 2 = 88 KB,
> so after ~40 lost packets in the network, you no longer can accept ipv6 fragments, until garbage
> collector evicted old datagrams.
> 

Great.
Moshe, please try the suggested above.

In case these values dramatically improve performance, maybe its time to 
change the default.

Thanks,
Tariq

> 
> 
> 
> 
> 
>