lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 31 Aug 2021 16:24:09 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'David Ahern' <dsahern@...il.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: RE: IP routing sending local packet to gateway.

From: David Ahern
> Sent: 27 August 2021 17:51
> 
> On 8/27/21 9:39 AM, David Laight wrote:
> > From: David Laight
> >> Sent: 27 August 2021 15:12
> >>
> >> I've an odd IP routing issue.
> >> A packet that should be sent on the local subnet (to an ARPed address)
> >> is being send to the default gateway instead.
> >
> > I've done some tests on a different network where it all appears to work.
> >
> > But running 'tcpdump -pen' shows that all the outbound packets for the
> > TCP connections are being sent to the default gateway.
> >
> > 5.10.30, 5.10.61 and 5.14.0-rc7 all behave the same way.
> >
> > If do a ping (in either direction) I get an ARP table entry.
> > But TCP connections (in or out) always use the default gateway.
> >
> > I'm now getting more confused.
> > I noticed that the 'default route' was missing the 'metric 100' bit.
> > That might give the behaviour I'm seeing if the netmask width is ignored.

Setting the metric/priority to 100 makes no difference.
I actually patched the kernel code that processes the netlink
socket request rather than the application that generated the request.
Note that the application hasn't really been changed for 10 years.

> > But if I delete the default route (neither netstat -r or ip route show
> > it) then packets are still being sent to the deleted gateway.
> > If I delete the arp/neigh entry for the deleted default gateway an
> > outward connection recreates the entry - leaving the one for the actual
> > address 'STALE'.
> >
> > Something very odd is going on.
> 
> perf record -e fib:* -a -g -- <run tests>
> ctrl-c
> perf script
> 
> It should tell you code paths and route lookup results. Should shed some
> light on why the gw vs local.

How do I cross-compile 'perf', there don't seem to be any obvious
hints in the Makefile.

But I'm not too sure that would help.
The response to an incoming TCP SYN seems to create a cached entry that
everything else then uses.
I've tried to untangle to code that caches a 'dst' entry on the socket
but it is all rather complicated.

I'm sure it has something to do with the 'fib_trie' data.
When it fails I get:
# cat /proc/net/fib_trie
Id 200:
  |-- 0.0.0.0
     /0 universe UNICAST
Main:
  +-- 0.0.0.0/0 3 0 6
     |-- 0.0.0.0
        /0 universe UNICAST
     |-- 192.168.1.0
        /24 link UNICAST
Local:
  +-- 0.0.0.0/0 2 0 2
     +-- 127.0.0.0/8 2 0 2
        +-- 127.0.0.0/31 1 0 0
           |-- 127.0.0.0
              /32 link BROADCAST
              /8 host LOCAL
           |-- 127.0.0.1
              /32 host LOCAL
        |-- 127.255.255.255
           /32 link BROADCAST
     +-- 192.168.1.0/24 2 0 1
        |-- 192.168.1.0
           /32 link BROADCAST
        |-- 192.168.1.99
           /32 host LOCAL
        |-- 192.168.1.255
           /32 link BROADCAST

1.99 is localhost, gw is 1.1 and the only remote 1.53.
Apart from the 'Id 200' bit (which I assume is something
to do with my bonds) it looks much like a working system.

I can't find anything that lists the cached rt/dst entries
that are cached by the socket.

I remember from looking up the rawip send path that the initial
lookup for outbound messages just finds the 'route' entry and
a second lookup (ref-counting another structure) is done to
get the rt/dst to save on the socket.
(The rawip send ended up creating one for every packet and then
deleting them in massive batches from an rcu timeout.)

I'm guessing that something got broken when that change to the
routing code was made.
It was the change that broke rawip sends where the ip address
in the IP-header didn't match that in the destaddr field.
Was a long time ago.
I wonder if I can test the older kernel.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ