netdev - Re: [PATCH v4 net-next 00/19] inet: frags: bring rhashtables to IP defrag

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180528145224.3ih6urfixwv4fwkf@x220t>
Date:   Mon, 28 May 2018 10:52:24 -0400
From:   Alexander Aring <aring@...atatu.com>
To:     Tariq Toukan <tariqt@...lanox.com>
Cc:     David Miller <davem@...emloft.net>, edumazet@...gle.com,
        netdev@...r.kernel.org, fw@...len.de, herbert@...dor.apana.org.au,
        tgraf@...g.ch, brouer@...hat.com, alex.aring@...il.com,
        stefan@....samsung.com, ktkhai@...tuozzo.com,
        eric.dumazet@...il.com, Moshe Shemesh <moshe@...lanox.com>,
        Eran Ben Elisha <eranbe@...lanox.com>
Subject: Re: [PATCH v4 net-next 00/19] inet: frags: bring rhashtables to IP
 defrag

Hi,

On Mon, May 28, 2018 at 12:12:42PM +0300, Tariq Toukan wrote:
> 
> 
> On 01/04/2018 6:25 AM, David Miller wrote:
> > From: Eric Dumazet <edumazet@...gle.com>
> > Date: Sat, 31 Mar 2018 12:58:41 -0700
> > 
> > > IP defrag processing is one of the remaining problematic layer in linux.
> > > 
> > > It uses static hash tables of 1024 buckets, and up to 128 items per bucket.
> > > 
> > > A work queue is supposed to garbage collect items when host is under memory
> > > pressure, and doing a hash rebuild, changing seed used in hash computations.
> > > 
> > > This work queue blocks softirqs for up to 25 ms when doing a hash rebuild,
> > > occurring every 5 seconds if host is under fire.
> > > 
> > > Then there is the problem of sharing this hash table for all netns.
> > > 
> > > It is time to switch to rhashtables, and allocate one of them per netns
> > > to speedup netns dismantle, since this is a critical metric these days.
> > > 
> > > Lookup is now using RCU, and 64bit hosts can now provision whatever amount
> > > of memory needed to handle the expected workloads.
> >   ...
> > 
> > Series applied, thanks Eric.
> > 
> 
> Hi Eric,
> 
> Recently my colleague (Moshe Shemesh) got a failure in upstream regression,
> which is related to this patchset. We don’t see the failure before it was
> merged.
> We checked again on net-next (from May 24th), it still reproduces.
> 
> The test case runs netperf with ipv6 udp single stream (64K message size).
> After the change we see huge packet loss:
> 145,134 messages failed out of 145,419 (only 285 fully received)
> 

as somebody who had similar issues with this patch series I can tell you
about what happened for the 6LoWPAN fragmentation.

The issue sounds similar, but there is too much missing information here
to say something about if you have exactly the issue which we had.

Our problem:

The patch series uses memcmp() to compare hash keys, we had some padding
bytes in our hash key and it occurs that we had sometimes random bytes
in this structure when it's put on stack. We solved it by a struct
foo_key bar = {}, which in case of gcc it _seems_ it makes a whole
memset(bar, 0, ..) on the structure.

I asked on the netdev mailinglist how to deal with this problem in
general, because = {} works in case of gcc, others compilers may have a
different handling or even gcc will changes this behaviour in future.
I got no reply so I did what it works for me. :-)

At least maybe a memcmp() on structures should never be used, it should
be compared by field. I would recommend this way when the compiler is
always clever enough to optimize it in some cases, but I am not so a
compiler expert to say anything about that.

I checked the hash key structures for x86_64 and pahole, so far I didn't
find any padding bytes there, but it might be different on
architectures or ?compiler?.

Additional useful information to check if you running into the same problem
would be:

 - Which architecture do you use?

 - Do you have similar problems with a veth setup?

You could also try this:

diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index b939b94e7e91..40ece9ab8b12 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -142,19 +142,19 @@ static void ip6_frag_expire(struct timer_list *t)
 static struct frag_queue *
 fq_find(struct net *net, __be32 id, const struct ipv6hdr *hdr, int iif)
 {
-       struct frag_v6_compare_key key = {
-               .id = id,
-               .saddr = hdr->saddr,
-               .daddr = hdr->daddr,
-               .user = IP6_DEFRAG_LOCAL_DELIVER,
-               .iif = iif,
-       };
+       struct frag_v6_compare_key key = {};
        struct inet_frag_queue *q;
 
        if (!(ipv6_addr_type(&hdr->daddr) & (IPV6_ADDR_MULTICAST |
                                            IPV6_ADDR_LINKLOCAL)))
                key.iif = 0;
 
+       key.id = id;
+       key.saddr = hdr->saddr;
+       key.daddr = hdr->daddr;
+       key.user = IP6_DEFRAG_LOCAL_DELIVER;
+       key.iif = iif;
+
        q = inet_frag_find(&net->ipv6.frags, &key);
        if (!q)
                return NULL;

- Alex