lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <15a53d9cc54d42dca565247363b5c205@AcuMS.aculab.com>
Date:   Fri, 27 Aug 2021 14:11:44 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: IP routing sending local packet to gateway.

I've an odd IP routing issue.
A packet that should be sent on the local subnet (to an ARPed address)
is being send to the default gateway instead.

What seems to happen is:
A TCP connection is opened between A and B.
The only traffic to B is application level keepalives on the connection.
This state is completely stable.

A then makes another connection to B.
B sends the SYN-ACK packet to the default gateway G.
G ARP's B and sends an ICMP host redirect packet to B.

G doesn't seem to forward the packet to A.
B also ignores the icmp redirect.

Now B is sending all traffic with A's IP address to G's MAC address.
So all the connections retry and then timeout.

In this state arping will work while (icmp) ping fails!
Although one of the ping requests does 'fix' it.
Possibly when A actually ARPs B - but I'm not sure.

A is ubuntu 20.0 (5.4.0-81) under vmware - but probably not relevant.
G is likely to be Linux with IP forwarding enabled.

B is an x86-64 kernel I've built from the 5.10.36 LTS sources.
Userspace buildroot/busybox (I need to add ftrace).

Running netstat -rn on B gives the expected 2 routes.
arp -an always seems to show a MAC address for A's IP.

Before I start digging through the code has anyone any ideas?
I don't remember seeing anything going through the mailing lists.

My 'gut feel' is that it has something to do with the arp table
entry timing out (10 minutes??).
The existing TCP connection has a reference to the ARP entry and
is probably using it even though it might be stale.
But the SYN-ACK transmit is trying to locate the entry so may
well have a different error action.

I've not seen any arp packets while the application keepalives
are going on - but those messages are every 5 seconds.
It might be that the arp request on the 10 minute timer
isn't actually being sent (or responded to) and the 'arp failed'
state is getting set so that the later request decides the
'local route' is broken and so uses the 'default route' instead.

B does have two interfaces setup as a 'bond' but only one IP
address on the single virtual interface.
That shouldn't be relevant since it looks like IP routing
rather than anything lower down.

I've not tried any other kernel versions.
I do need to start using the latest 5.10 one soon.
(Build is set to use kernels from kernel.org rather than git.)

Any ideas/suggestions?

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ