[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <11b2baca-c810-3f61-38d1-415099783129@mellanox.com>
Date: Wed, 30 May 2018 10:20:42 +0300
From: Tariq Toukan <tariqt@...lanox.com>
To: Eric Dumazet <eric.dumazet@...il.com>,
Alexander Aring <aring@...atatu.com>,
Tariq Toukan <tariqt@...lanox.com>,
Moshe Shemesh <moshe@...lanox.com>
Cc: David Miller <davem@...emloft.net>, edumazet@...gle.com,
netdev@...r.kernel.org, fw@...len.de, herbert@...dor.apana.org.au,
tgraf@...g.ch, brouer@...hat.com, alex.aring@...il.com,
stefan@....samsung.com, ktkhai@...tuozzo.com,
Eran Ben Elisha <eranbe@...lanox.com>
Subject: Re: [PATCH v4 net-next 00/19] inet: frags: bring rhashtables to IP
defrag
On 28/05/2018 7:09 PM, Eric Dumazet wrote:
>
>
> On 05/28/2018 07:52 AM, Alexander Aring wrote:
>
>> as somebody who had similar issues with this patch series I can tell you
>> about what happened for the 6LoWPAN fragmentation.
>>
>> The issue sounds similar, but there is too much missing information here
>> to say something about if you have exactly the issue which we had.
>>
>> Our problem:
>>
>> The patch series uses memcmp() to compare hash keys, we had some padding
>> bytes in our hash key and it occurs that we had sometimes random bytes
>> in this structure when it's put on stack. We solved it by a struct
>> foo_key bar = {}, which in case of gcc it _seems_ it makes a whole
>> memset(bar, 0, ..) on the structure.
>>
>> I asked on the netdev mailinglist how to deal with this problem in
>> general, because = {} works in case of gcc, others compilers may have a
>> different handling or even gcc will changes this behaviour in future.
>> I got no reply so I did what it works for me. :-)
>>
>> At least maybe a memcmp() on structures should never be used, it should
>> be compared by field. I would recommend this way when the compiler is
>> always clever enough to optimize it in some cases, but I am not so a
>> compiler expert to say anything about that.
>>
>> I checked the hash key structures for x86_64 and pahole, so far I didn't
>> find any padding bytes there, but it might be different on
>> architectures or ?compiler?.
>>
>> Additional useful information to check if you running into the same problem
>> would be:
>>
>> - Which architecture do you use?
>>
>> - Do you have similar problems with a veth setup?
>>
>> You could also try this:
>>
>> diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
>> index b939b94e7e91..40ece9ab8b12 100644
>> --- a/net/ipv6/reassembly.c
>> +++ b/net/ipv6/reassembly.c
>> @@ -142,19 +142,19 @@ static void ip6_frag_expire(struct timer_list *t)
>> static struct frag_queue *
>> fq_find(struct net *net, __be32 id, const struct ipv6hdr *hdr, int iif)
>> {
>> - struct frag_v6_compare_key key = {
>> - .id = id,
>> - .saddr = hdr->saddr,
>> - .daddr = hdr->daddr,
>> - .user = IP6_DEFRAG_LOCAL_DELIVER,
>> - .iif = iif,
>> - };
>> + struct frag_v6_compare_key key = {};
>> struct inet_frag_queue *q;
>>
>> if (!(ipv6_addr_type(&hdr->daddr) & (IPV6_ADDR_MULTICAST |
>> IPV6_ADDR_LINKLOCAL)))
>> key.iif = 0;
>>
>> + key.id = id;
>> + key.saddr = hdr->saddr;
>> + key.daddr = hdr->daddr;
>> + key.user = IP6_DEFRAG_LOCAL_DELIVER;
>> + key.iif = iif;
>> +
>> q = inet_frag_find(&net->ipv6.frags, &key);
>> if (!q)
>> return NULL;
>>
>> - Alex
>>
>
> Hi Alex.
>
> This patch makes no sense, since struct frag_v6_compare_key has no hole.
>
> Only 6LoWPAN had a problem really, because of its way of having unions (and holes).
>
> Also note that your patch would break the case when we force key.iif to be zero.
>
>
> Tariq, here are my test results : No drops for me.
>
> # ./netperf -H 2607:f8b0:8099:e18:: -t UDP_STREAM
> MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to 2607:f8b0:8099:e18:: () port 0 AF_INET6
> Socket Message Elapsed Messages
> Size Size Time Okay Errors Throughput
> bytes bytes secs # # 10^6bits/sec
>
> 212992 65507 10.00 202117 0 10592.00
> 212992 10.00 0 0.00
>
> Somehow, you might send packets too fast and receiver has a problem with that ?
Not sure, the transmit BW you get is higher than what we saw.
Anyway, we'll check this.
> For particular needs, you might need to adjust :
>
> /proc/sys/net/ipv6/ip6frag_time (to 2 seconds instead of the default of 60)
> /proc/sys/net/ipv6/ip6frag_low_thresh
> /proc/sys/net/ipv6/ip6frag_high_thresh
>
> Once your receiver has filled its capacity with frags, the default of 60 seconds to garbage collect
> might be the reason you notice a problem.
>
> Check :
> grep FRAG6 /proc/net/sockstat6
>
> On Google servers we multiply by 25 the limits for ipv6 frags memory usage :
>
> /proc/sys/net/ipv6/ip6frag_high_thresh:104857600 (instead of 4MB)
> /proc/sys/net/ipv6/ip6frag_low_thresh:78643200 (instead of 3 MB)
>
> When using 64KB datagrams, note that the truesize of the datagram would be about 44 * 2 = 88 KB,
> so after ~40 lost packets in the network, you no longer can accept ipv6 fragments, until garbage
> collector evicted old datagrams.
>
Great.
Moshe, please try the suggested above.
In case these values dramatically improve performance, maybe its time to
change the default.
Thanks,
Tariq
>
>
>
>
>
>
Powered by blists - more mailing lists