netdev - Re: [PATCH v4 net-next 00/19] inet: frags: bring rhashtables to IP defrag

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <9329021e-2d77-7e90-b0e2-8b391508f6cb@mellanox.com>
Date:   Mon, 28 May 2018 12:12:42 +0300
From:   Tariq Toukan <tariqt@...lanox.com>
To:     David Miller <davem@...emloft.net>, edumazet@...gle.com
Cc:     netdev@...r.kernel.org, fw@...len.de, herbert@...dor.apana.org.au,
        tgraf@...g.ch, brouer@...hat.com, alex.aring@...il.com,
        stefan@....samsung.com, ktkhai@...tuozzo.com,
        eric.dumazet@...il.com, Moshe Shemesh <moshe@...lanox.com>,
        Eran Ben Elisha <eranbe@...lanox.com>
Subject: Re: [PATCH v4 net-next 00/19] inet: frags: bring rhashtables to IP
 defrag



On 01/04/2018 6:25 AM, David Miller wrote:
> From: Eric Dumazet <edumazet@...gle.com>
> Date: Sat, 31 Mar 2018 12:58:41 -0700
> 
>> IP defrag processing is one of the remaining problematic layer in linux.
>>
>> It uses static hash tables of 1024 buckets, and up to 128 items per bucket.
>>
>> A work queue is supposed to garbage collect items when host is under memory
>> pressure, and doing a hash rebuild, changing seed used in hash computations.
>>
>> This work queue blocks softirqs for up to 25 ms when doing a hash rebuild,
>> occurring every 5 seconds if host is under fire.
>>
>> Then there is the problem of sharing this hash table for all netns.
>>
>> It is time to switch to rhashtables, and allocate one of them per netns
>> to speedup netns dismantle, since this is a critical metric these days.
>>
>> Lookup is now using RCU, and 64bit hosts can now provision whatever amount
>> of memory needed to handle the expected workloads.
>   ...
> 
> Series applied, thanks Eric.
> 

Hi Eric,

Recently my colleague (Moshe Shemesh) got a failure in upstream 
regression, which is related to this patchset. We don’t see the failure 
before it was merged.
We checked again on net-next (from May 24th), it still reproduces.

The test case runs netperf with ipv6 udp single stream (64K message size).
After the change we see huge packet loss:
145,134 messages failed out of 145,419 (only 285 fully received)

[root@...-l-vrt-67100-104 ~]# netperf -H 
fe80::e61d:2dff:feca:c7c3%ens9,inet6 -t udp_stream --
MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to 
fe80::e61d:2dff:feca:c7c3%ens9 () port 0 AF_INET6
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   65507   10.00      145419      0    7620.35
212992           10.00         285             14.93

By checking nstat counters we see that Ip6ReasmFails got very high:
#kernel
...
Ip6InReceives                   6665965            0.0
Ip6InDelivers                   300                0.0
Ip6OutRequests                  9                  0.0
Ip6ReasmReqds                   6665950            0.0
Ip6ReasmOKs                     285                0.0
Ip6ReasmFails                   6650890            0.0
Ip6InOctets                     9813929354         0.0
Ip6OutOctets                    2608               0.0
Ip6InNoECTPkts                  6665965            0.0
...
Udp6InDatagrams                 286                0.0
...

Same test on kernel without the patchset got low failure rate:
Only 810 messages failed out of 114,112 (113,302 fully received)

[root@...-l-vrt-67100-104 ~]# netperf -H 
fe80::e61d:2dff:feca:c7c3%ens9,inet6 -t udp_stream --
MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to 
fe80::e61d:2dff:feca:c7c3%ens9 () port 0 AF_INET6
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   65507   10.00      114112      0    5979.69
212992           10.00      113302           5937.24

nstat counters to compare:
#kernel
...
Ip6InReceives                   5249166            0.0
Ip6InDelivers                   114126             0.0
Ip6OutRequests                  8                  0.0
Ip6ReasmReqds                   5249152            0.0
Ip6ReasmOKs                     114112             0.0
Ip6InOctets                     7728009224         0.0
Ip6OutOctets                    2544               0.0
Ip6InNoECTPkts                  5249166            0.0
...
Udp6InDatagrams                 113303             0.0
Udp6InErrors                    810                0.0
Udp6RcvbufErrors                810                0.0
...

We did not get to bisect within the patchset yet.


Regards,
Tariq and Moshe