lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0bb2faec-ba8d-51d7-0c1a-12520b9168e8@mellanox.com>
Date:   Thu, 31 May 2018 15:18:07 +0300
From:   Moshe Shemesh <moshe@...lanox.com>
To:     Tariq Toukan <tariqt@...lanox.com>,
        Eric Dumazet <eric.dumazet@...il.com>,
        Alexander Aring <aring@...atatu.com>
Cc:     David Miller <davem@...emloft.net>, edumazet@...gle.com,
        netdev@...r.kernel.org, fw@...len.de, herbert@...dor.apana.org.au,
        tgraf@...g.ch, brouer@...hat.com, alex.aring@...il.com,
        stefan@....samsung.com, ktkhai@...tuozzo.com,
        Eran Ben Elisha <eranbe@...lanox.com>
Subject: Re: [PATCH v4 net-next 00/19] inet: frags: bring rhashtables to IP
 defrag



On 5/30/2018 10:20 AM, Tariq Toukan wrote:
> 
> 
> On 28/05/2018 7:09 PM, Eric Dumazet wrote:
>>
>>
>> On 05/28/2018 07:52 AM, Alexander Aring wrote:
>>
>>> as somebody who had similar issues with this patch series I can tell you
>>> about what happened for the 6LoWPAN fragmentation.
>>>
>>> The issue sounds similar, but there is too much missing information here
>>> to say something about if you have exactly the issue which we had.
>>>
>>> Our problem:
>>>
>>> The patch series uses memcmp() to compare hash keys, we had some padding
>>> bytes in our hash key and it occurs that we had sometimes random bytes
>>> in this structure when it's put on stack. We solved it by a struct
>>> foo_key bar = {}, which in case of gcc it _seems_ it makes a whole
>>> memset(bar, 0, ..) on the structure.
>>>
>>> I asked on the netdev mailinglist how to deal with this problem in
>>> general, because = {} works in case of gcc, others compilers may have a
>>> different handling or even gcc will changes this behaviour in future.
>>> I got no reply so I did what it works for me. :-)
>>>
>>> At least maybe a memcmp() on structures should never be used, it should
>>> be compared by field. I would recommend this way when the compiler is
>>> always clever enough to optimize it in some cases, but I am not so a
>>> compiler expert to say anything about that.
>>>
>>> I checked the hash key structures for x86_64 and pahole, so far I didn't
>>> find any padding bytes there, but it might be different on
>>> architectures or ?compiler?.
>>>
>>> Additional useful information to check if you running into the same 
>>> problem
>>> would be:
>>>
>>>   - Which architecture do you use?
>>>
>>>   - Do you have similar problems with a veth setup?
>>>
>>> You could also try this:
>>>
>>> diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
>>> index b939b94e7e91..40ece9ab8b12 100644
>>> --- a/net/ipv6/reassembly.c
>>> +++ b/net/ipv6/reassembly.c
>>> @@ -142,19 +142,19 @@ static void ip6_frag_expire(struct timer_list *t)
>>>   static struct frag_queue *
>>>   fq_find(struct net *net, __be32 id, const struct ipv6hdr *hdr, int 
>>> iif)
>>>   {
>>> -       struct frag_v6_compare_key key = {
>>> -               .id = id,
>>> -               .saddr = hdr->saddr,
>>> -               .daddr = hdr->daddr,
>>> -               .user = IP6_DEFRAG_LOCAL_DELIVER,
>>> -               .iif = iif,
>>> -       };
>>> +       struct frag_v6_compare_key key = {};
>>>          struct inet_frag_queue *q;
>>>          if (!(ipv6_addr_type(&hdr->daddr) & (IPV6_ADDR_MULTICAST |
>>>                                              IPV6_ADDR_LINKLOCAL)))
>>>                  key.iif = 0;
>>> +       key.id = id;
>>> +       key.saddr = hdr->saddr;
>>> +       key.daddr = hdr->daddr;
>>> +       key.user = IP6_DEFRAG_LOCAL_DELIVER;
>>> +       key.iif = iif;
>>> +
>>>          q = inet_frag_find(&net->ipv6.frags, &key);
>>>          if (!q)
>>>                  return NULL;
>>>
>>> - Alex
>>>
>>
>> Hi Alex.
>>
>> This patch makes no sense, since struct frag_v6_compare_key has no hole.
>>
>> Only 6LoWPAN had a problem really, because of its way of having unions 
>> (and holes).
>>
>> Also note that your patch would break the case when we force key.iif 
>> to be zero.
>>
>>
>> Tariq, here are my test results : No drops for me.
>>
>> # ./netperf -H 2607:f8b0:8099:e18:: -t UDP_STREAM
>> MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to 
>> 2607:f8b0:8099:e18:: () port 0 AF_INET6
>> Socket  Message  Elapsed      Messages
>> Size    Size     Time         Okay Errors   Throughput
>> bytes   bytes    secs            #      #   10^6bits/sec
>>
>> 212992   65507   10.00      202117      0    10592.00
>> 212992           10.00           0              0.00
>>
>> Somehow, you might send packets too fast and receiver has a problem 
>> with that ?
> 
> Not sure, the transmit BW you get is higher than what we saw.
> Anyway, we'll check this.
> 
>> For particular needs, you might need to adjust :
>>
>> /proc/sys/net/ipv6/ip6frag_time  (to 2 seconds instead of the default 
>> of 60)
>> /proc/sys/net/ipv6/ip6frag_low_thresh
>> /proc/sys/net/ipv6/ip6frag_high_thresh
>>
>> Once your receiver has filled its capacity with frags, the default of 
>> 60 seconds to garbage collect
>> might be the reason you notice a problem.
>>
>> Check :
>> grep FRAG6 /proc/net/sockstat6
>>
>> On Google servers we multiply by 25 the limits for ipv6 frags memory 
>> usage :
>>
>> /proc/sys/net/ipv6/ip6frag_high_thresh:104857600  (instead of 4MB)
>> /proc/sys/net/ipv6/ip6frag_low_thresh:78643200  (instead of 3 MB)
>>
>> When using 64KB datagrams, note that the truesize of the datagram 
>> would be about 44 * 2 = 88 KB,
>> so after ~40 lost packets in the network, you no longer can accept 
>> ipv6 fragments, until garbage
>> collector evicted old datagrams.
>>
> 
> Great.
> Moshe, please try the suggested above.

I do see big improvement after changing the 3 parameters as Eric suggested:
/proc/sys/net/ipv6/ip6frag_time  set to 2
/proc/sys/net/ipv6/ip6frag_low_thresh set to 104857600
/proc/sys/net/ipv6/ip6frag_high_thresh set to 78643200


[root@...-l-vrt-67100-104 linux-stable]#  netperf -H 
fe80::7efe:90ff:fed5:bb48%ens9,inet6 -t udp_stream --
MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to 
fe80::7efe:90ff:fed5:bb48%ens9 () port 0 AF_INET6
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   65507   10.00      156387      0    8194.60
212992           10.00       76901           4029.57

#kernel
Ip6InReceives                   7107999            0.0
Ip6InDelivers                   114126             0.0
Ip6OutRequests                  47                 0.0
Ip6ReasmTimeout                 5115               0.0
Ip6ReasmReqds                   7107987            0.0
Ip6ReasmOKs                     114114             0.0
Ip6ReasmFails                   1714146            0.0
...
Udp6InDatagrams                 112486             0.0
Udp6InErrors                    1629               0.0
Udp6RcvbufErrors                1629               0.0
...

While before these parameters settings I got:
[root@...-l-vrt-67100-104 ~]# netperf -H 
fe80::e61d:2dff:feca:c7c3%ens9,inet6 -t udp_stream --
MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to 
fe80::e61d:2dff:feca:c7c3%ens9 () port 0 AF_INET6
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   65507   10.00      145419      0    7620.35
212992           10.00         285             14.93

#kernel
Ip6InReceives                   6665965            0.0
Ip6InDelivers                   300                0.0
Ip6OutRequests                  9                  0.0
Ip6ReasmReqds                   6665950            0.0
Ip6ReasmOKs                     285                0.0
Ip6ReasmFails                   6650890            0.0
...
Udp6InDatagrams                 286                0.0


however, before the patchset, I got much better results:
[root@...-l-vrt-67100-104 linux-stable]#  netperf -H 
fe80::7efe:90ff:fed5:bb48%ens9,inet6 -t udp_stream --
MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to 
fe80::7efe:90ff:fed5:bb48%ens9 () port 0 AF_INET6
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   65507   10.00      158935      0    8328.32
212992           10.00      144652           7579.88


#kernel
Ip6InReceives                   7088903            0.0
Ip6InDelivers                   154117             0.0
Ip6OutRequests                  9                  0.0
Ip6ReasmReqds                   7088889            0.0
Ip6ReasmOKs                     154103             0.0
...
Udp6InDatagrams                 144653             0.0
Udp6InErrors                    9451               0.0
Udp6RcvbufErrors                9451               0.0


> 
> In case these values dramatically improve performance, maybe its time to 
> change the default.
> 
> Thanks,
> Tariq
> 
>>
>>
>>
>>
>>
>>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ