netdev - Re: Unable to flush ICMP redirect routes in kernel 3.0+

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4EC439F2.3080809@icdsoft.com>
Date:	Thu, 17 Nov 2011 00:32:18 +0200
From:	Ivan Zahariev <famzah@...soft.com>
To:	netdev@...r.kernel.org
Subject: Re: Unable to flush ICMP redirect routes in kernel 3.0+

On 11/15/2011 11:09 PM, Eric Dumazet wrote:
> Le mardi 15 novembre 2011 à 22:23 +0200, Ivan Zahariev a écrit :
>> Hello,
>>
>> We have changed nothing in our network infrastructure but only upgraded
>> from Linux kernel 2.6.36.2 to 3.0.3. Here is the problem we are
>> experiencing:
>>
>> ICMP redirected routes are cached forever, and they can be cleared only
>> by a reboot.
>>
>> Here is an example:
>>
>> root@...hine5:~# ip route get 1.1.1.1
>> 1.1.1.1 via 9.0.0.1 dev eth0  src 5.5.5.5
>>       cache<redirected>   ipid 0xfb5d rtt 1475ms rttvar 450ms cwnd 10
>>
>> root@...hine5:~# ip route list cache match 1.1.1.1
>> 1.1.1.1 tos lowdelay via 9.0.0.1 dev eth0  src 5.5.5.5
>>       cache<redirected>   ipid 0xfb5d rtt 1475ms rttvar 450ms cwnd 10
>> 1.1.1.1 via 9.0.0.1 dev eth0  src 5.5.5.5
>>       cache<redirected>   ipid 0xfb5d rtt 1475ms rttvar 450ms cwnd 10
>> ...(two more entries, all go via 9.0.0.1)...
>>
>> 1.1.1.1 is the test destination address
>> 5.5.5.5 is the source IP address of "machine5" via dev eth0, the only
>> interface besides "lo"
>> 9.0.0.1 is the incorrect gateway which we were redirected to; we want to
>> change the route to 9.0.0.8
>>
>> I found no way to clear this route. What I tried:
>>
>> root@...hine5:~# ip route flush cache ### CACHE FLUSH ###
>> root@...hine5:~# ip route list cache match 1.1.1.1 # empty
>>
>> root@...hine5:~# ip route flush cache ### CACHE FLUSH ###
>> root@...hine5:~# echo 1>  /proc/sys/net/ipv4/route/flush
>> root@...hine5:~# ip route list cache match 1.1.1.1 # empty
>>
>> root@...hine5:~# ip route get 1.1.1.1 # magically re-inserts the
>> <redirected>  route, tcpdump sees NO ICMP traffic
>> 1.1.1.1 via 9.0.0.1 dev eth0  src 5.5.5.5
>>       cache<redirected>   ipid 0xfb5d rtt 1475ms rttvar 450ms cwnd 10
>>
>> I also tried to force a scheduled route flush:
>>
>> root@...hine5:~# echo 1>  /proc/sys/net/ipv4/route/gc_timeout
>> root@...hine5:~# echo 1>  /proc/sys/net/ipv4/route/gc_interval
>>
>> A reboot fixed it all.
>>
>> This may be related to the "Several major changes to our routing
>> infrastructure" (https://lkml.org/lkml/2011/3/16/384).
>> Other users are reporting the same problem:
>> * https://plus.google.com/u/0/117161704068825702652/posts/1UK1Rp4KA4J
>> * http://lists.debian.org/debian-kernel/2011/10/msg00633.html
>> Other similar issues:
>> * http://www.spinics.net/lists/netdev/msg176966.html
>> * http://forums.gentoo.org/viewtopic-t-901024-start-0.html
>>
>> This has been occurring on a few KVM guest machines and also on a
>> regular Linux machine, so it's not KVM related.
>>
>> Is this a bug, or it's me who's missing something?
>>
> It is a bug, and as such could you provide needed information for us to
> reproduce it ?
>
> What is your network setup ?

Network setup is nothing fancy. We have the following machines on a 
single /24 ethernet segment:
* 192.168.0.244 (machine5) -- the machine on which we reproduce the 
kernel routing bug; kernel: 3.0.3-grsec
* 192.168.0.8   (router8)  -- the default gw for the whole 
192.168.0.0/24 network; does SNAT; kernel: 2.6.32-5-686
* 192.168.0.120 -- another host with disabled ip_forwarding; must be up 
and reachable

There are two bugs actually:
1. Basically, *any* ICMP redirect is cached forever.
2. The output of "ip route" is not consistent with the kernel's routing 
behavior.

Quick fix: Disabling "net.ipv4.conf.*.accept_redirects" on all 
interfaces works OK and prevents ICMP redirects from affecting the 
internal route cache.

Here is a sample test-case scenario:

### right after a clean machine reboot
root@...hine5:~# ip route list cache match 8.8.4.4

root@...hine5:~# ip route get 8.8.4.4
8.8.4.4 via 192.168.0.8 dev eth0  src 192.168.0.244
     cache

### make a TCP request; the TCP packets go to the default gw 
192.168.0.8; we see this with a tcpdump at 192.168.0.8
root@...hine5:~# telnet 8.8.4.4

### route is still OK and as expected
root@...hine5:~# ip route list cache match 8.8.4.4
8.8.4.4 from 192.168.0.244 tos lowdelay via 192.168.0.8 dev eth0
     cache  ipid 0x303a
8.8.4.4 tos lowdelay via 192.168.0.8 dev eth0  src 192.168.0.244
     cache  ipid 0x303a
8.8.4.4 via 192.168.0.8 dev eth0  src 192.168.0.244
     cache

root@...hine5:~# ip route get 8.8.4.4
8.8.4.4 via 192.168.0.8 dev eth0  src 192.168.0.244
     cache

### change route to a fake host on the same subnet, so that an ICMP 
redirect will follow later
### we also disable NAT for 192.168.0.244, so that an ICMP redirect is 
sent accordingly
root@...ter8:~# route add -host 8.8.4.4 gw 192.168.0.120

### first TCP packet goes to the default gw 192.168.0.8; we see this 
with a tcpdump at 192.168.0.8
root@...hine5:~# telnet 8.8.4.4

### at machine5: we got the ICMP redirect from the default gw, as expected
# tcpdump: IP 192.168.0.8 > 192.168.0.244: ICMP redirect 8.8.4.4 to host 
192.168.0.120, length 68

### the TCP packets now start to use the <redirected> route 
192.168.0.120; we see this with a tcpdump at 192.168.0.120
root@...hine5:~# telnet 8.8.4.4

### (bug #2) what "ip route" returns is inconsistent, because we are 
using the <redirected> route 192.168.0.120 in reality
### note that the count of the route lines increased with one
root@...hine5:~# ip route list cache match 8.8.4.4
8.8.4.4 from 192.168.0.244 tos lowdelay via 192.168.0.8 dev eth0
     cache  ipid 0x303a
8.8.4.4 tos lowdelay via 192.168.0.8 dev eth0  src 192.168.0.244
     cache  ipid 0x303a
8.8.4.4 via 192.168.0.8 dev eth0  src 192.168.0.244
     cache
8.8.4.4 from 192.168.0.244 tos lowdelay via 192.168.0.8 dev eth0
     cache  ipid 0x303a

root@...hine5:~# ip route get 8.8.4.4
8.8.4.4 via 192.168.0.8 dev eth0  src 192.168.0.244
     cache

### restore the route on the default gw 192.168.0.8, so that it accepts 
8.8.4.4 as destination again
### restore NAT for 192.168.0.244
root@...ter8:~# route del -host 8.8.4.4 gw 192.168.0.120

### (bug #1) even though we flushed the route cache, the <redirected> 
route resurrects from somewhere; even without making any TCP requests
### this time what "ip" returns is consistent with the real (incorrect) 
routing behavior of machine5
root@...hine5:~# ip route flush cache
root@...hine5:~# ip route list cache match 8.8.4.4
root@...hine5:~# ip route get 8.8.4.4
8.8.4.4 via 192.168.0.120 dev eth0  src 192.168.0.244
     cache <redirected>  ipid 0x303a

### the TCP packets STILL use the <redirected> route 192.168.0.120; we 
see this with a tcpdump at 192.168.0.120
root@...hine5:~# telnet 8.8.4.4

### only a reboot clears the cached <redirected> routes


Cheers.
--Ivan
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html