lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <alpine.LRH.2.20.1905021055490.5146@dhcp-10-175-160-224.vpn.oracle.com>
Date:   Thu, 2 May 2019 11:00:07 +0100 (BST)
From:   Alan Maguire <alan.maguire@...cle.com>
To:     David Ahern <dsahern@...il.com>
cc:     Alan Maguire <alan.maguire@...cle.com>, netdev@...r.kernel.org,
        daniel@...earbox.net, Ian Kumlien <ian.kumlien@...il.com>
Subject: Re: MPLS encapsulation and arp table overflow

On Wed, 1 May 2019, David Ahern wrote:

> On 5/1/19 10:03 AM, Alan Maguire wrote:
> > I'm seeing the following repeated error
> > 
> > [  130.821362] neighbour: arp_cache: neighbor table overflow!
> > 
> > when using MPLSoverGRE or MPLSoverUDP tunnels on bits synced
> > with bpf-next as of this morning. The test script below reliably
> > reproduces the problem, while working fine on a 4.14 (I haven't
> > bisected yet). It can be run with no arguments, or specifying
> > gre or udp for the specific encap type.
> > 
> > It seems that every MPLS-encapsulated outbound packet is attempting
> > to add  a neighbor entry, and as a result we hit the 
> > net.ipv4.neigh.default.gc_thresh3 limit quickly.
> > 
> > When this failure occurs, the arp table doesn't show any of
> > these additional entries. Existing arp table entries are
> > disappearing too, so perhaps they are being recycled when the
> > table becomes full?
> > 
> 
> There are 2 bugs:
> 1. neigh_xmit fails to find a neighbor entry on every single Tx. This
> was introduced by:
> 
> cd9ff4de010 ("ipv4: Make neigh lookup keys for loopback/point-to-point
> devices be INADDR_ANY")
> 
> Basically, the primary_key is reset to 0 for tun's but the neigh_xmit
> lookup was not corrected.
> 
> That caused a new neigh entry to be created on each packet Tx, but
> before inserting the new one to the table the create function looks to
> see if an entry already exists. The arp constructor had reset the key to
> 0 in the new neighbor entry so the second lookup finds a match and the
> new one is dropped.
> 
> That exposed a second bug.
> 
> 2. neigh_alloc bumps the gc_entries counter when a new one is allocated,
> but ___neigh_create is not dropping the counter in the error path.
> 
> Ian reported a similar problem, but we were not able to isolate the cause.
>

Fantastic, thanks so much for the quick fixes! I verified them at my
end, ensuring that with the patches applied to the latest net tree,
the previously-failing test succeeds.
 
> Thanks for the script - very helpful in resolving the bugs. I made some
> changes to it and I plan to submit it to selftests as a starter for mpls
> tests.
> 

Sounds great! It's mostly cobbled together from Willem's
bpf test_tc_tunnel.sh script, so like that could probably be
generalized to cover more tunnel types too.

Thanks again!

Alan

> Bug fix patches coming.
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ