[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <YWOrt002RdCqkBeUL04N1MVxcsjRvmCb4iqMW67EmAQIG5erLlSntgQWmSYiHXAT8kgFTceURhTaP8dAp9nPD9q3lquhb0MTIRlP4Vy5k3Y=@protonmail.com>
Date: Sat, 26 Oct 2019 07:07:52 +0000
From: Ttttabcd <ttttabcd@...tonmail.com>
To: "alexander.h.duyck@...hat.com" <alexander.h.duyck@...hat.com>,
"davem@...emloft.net" <davem@...emloft.net>,
"idosch@...lanox.com" <idosch@...lanox.com>,
Netdev <netdev@...r.kernel.org>,
"lartc@...r.kernel.org" <lartc@...r.kernel.org>
Subject: Commit 0ddcf43d5d4a causes the local table to conflict with the main table
Recently I was implementing a transparent proxy project. Naturally, I started to research the Linux local routing table and the tproxy module.
But when I was studying the behavior of the local routing table, something strange happened, which caused me to spend a lot of time thinking about why.
ip route add local 0.0.0.0/0 dev lo
At the beginning, I tested a local routing table and added 0.0.0.0/0 to let the system think that the entire IPv4 space is a local address.
ip rule
0: from all lookup local
32766: from all lookup main
32767: from all lookup default
I did not add any policy routing, everything stays normal
ip route show table main
default via 192.168.3.1 dev wlan0 proto dhcp metric 600
192.168.3.0/24 dev wlan0 proto kernel scope link src 192.168.3.62 metric 600
The above is my main routing table, I am now on a network of 192.168.3.0/24, the default route is 192.168.3.1.
ip route show table local
local default dev lo scope host
broadcast 127.0.0.0 dev lo proto kernel scope link src 127.0.0.1
local 127.0.0.0/8 dev lo proto kernel scope host src 127.0.0.1
local 127.0.0.1 dev lo proto kernel scope host src 127.0.0.1
broadcast 127.255.255.255 dev lo proto kernel scope link src 127.0.0.1
broadcast 192.168.3.0 dev wlan0 proto kernel scope link src 192.168.3.62
local 192.168.3.62 dev wlan0 proto kernel scope host src 192.168.3.62
broadcast 192.168.3.255 dev wlan0 proto kernel scope link src 192.168.3.62
The above is my current local routing table, you can see that 0.0.0.0/0 (default) has been added, and 127.0.0.1, 127.0.0.0/8, 192.168.3.62 is also my local address.
According to the correct theory, all IPv4 addresses are now my local address, so whatever address I want to connect to should be routed to the loopback interface and sent back to myself.
ip route get 1.1.1.1
local 1.1.1.1 dev lo src 1.1.1.1 uid 0
cache <local>
ip route get 192.168.3.100
192.168.3.100 dev wlan0 src 192.168.3.62 uid 0
cache
But I did some testing but found that it was not like this. For example, I tested that 1.1.1.1 will be routed to lo, but 192.168.3.100 will be routed to wlan0. 192.168.3.100 is not routed according to the contents of the local routing table! 192.168.3.100 is normally routed in the main table.
This is the wrong handling. The correct way is that according to the priority of the policy routing, the local table has higher priority than the main table. It should match the local table first. As long as there is a corresponding routing table entry in the local table, the main table will not be used.
This problem has been a headache for me for a long time because it is inconsistent with the description of the ip rule by the Manual page. Until I found an amazing fact in /proc/net/fib_trie.
/proc/net/fib_trie
Main:
+-- 0.0.0.0/0 3 0 5
|-- 0.0.0.0
/0 host LOCAL
/0 universe UNICAST
+-- 127.0.0.0/8 2 0 2
+-- 127.0.0.0/31 1 0 0
|-- 127.0.0.0
/32 link BROADCAST
/8 host LOCAL
|-- 127.0.0.1
/32 host LOCAL
|-- 127.255.255.255
/32 link BROADCAST
+-- 192.168.3.0/24 2 1 2
+-- 192.168.3.0/26 2 0 2
|-- 192.168.3.0
/32 link BROADCAST
/24 link UNICAST
|-- 192.168.3.62
/32 host LOCAL
|-- 192.168.3.255
/32 link BROADCAST
Local:
+-- 0.0.0.0/0 3 0 5
|-- 0.0.0.0
/0 host LOCAL
/0 universe UNICAST
+-- 127.0.0.0/8 2 0 2
+-- 127.0.0.0/31 1 0 0
|-- 127.0.0.0
/32 link BROADCAST
/8 host LOCAL
|-- 127.0.0.1
/32 host LOCAL
|-- 127.255.255.255
/32 link BROADCAST
+-- 192.168.3.0/24 2 1 2
+-- 192.168.3.0/26 2 0 2
|-- 192.168.3.0
/32 link BROADCAST
/24 link UNICAST
|-- 192.168.3.62
/32 host LOCAL
|-- 192.168.3.255
/32 link BROADCAST
The contents of the main table and the local table are actually the same! What is this strange phenomenon?
The strange thing is not over yet, then I accidentally added a policy route.
ip rule add table 100 priority 10
ip rule
0: from all lookup local
10: from all lookup 100
32766: from all lookup main
32767: from all lookup default
I added table 100 with a priority of 10.So the RPDB has become like the above.
The miracle happened, the local table and the main table restored what I thought it should be, they were separated!
/proc/net/fib_trie
Main:
+-- 0.0.0.0/0 3 0 6
|-- 0.0.0.0
/0 universe UNICAST
|-- 192.168.3.0
/24 link UNICAST
Local:
+-- 0.0.0.0/0 3 0 5
|-- 0.0.0.0
/0 host LOCAL
+-- 127.0.0.0/8 2 0 2
+-- 127.0.0.0/31 1 0 0
|-- 127.0.0.0
/32 link BROADCAST
/8 host LOCAL
|-- 127.0.0.1
/32 host LOCAL
|-- 127.255.255.255
/32 link BROADCAST
+-- 192.168.3.0/24 2 1 2
+-- 192.168.3.0/26 2 0 2
|-- 192.168.3.0
/32 link BROADCAST
|-- 192.168.3.62
/32 host LOCAL
|-- 192.168.3.255
/32 link BROADCAST
So I immediately conducted the previous test.
ip route get 1.1.1.1
local 1.1.1.1 dev lo table local src 1.1.1.1 uid 0
cache <local>
ip route get 192.168.3.100
local 192.168.3.100 dev lo table local src 192.168.3.100 uid 0
cache <local>
The result is correct this time, both 1.1.1.1 and 192.168.3.100 are routed to the lo interface. And from the "table local" in the output, it seems that this time the local table is actually used.
Now I have a drink, which is really too ridiculous...
ip rule del table 100
Then I deleted the policy route I just added.
ip rule
0: from all lookup local
32766: from all lookup main
32767: from all lookup default
The RPDB recovered from the beginning, but the magic is that the main and local tables are still separate and not merged as they did at the beginning.
Main:
+-- 0.0.0.0/0 3 0 6
|-- 0.0.0.0
/0 universe UNICAST
|-- 192.168.3.0
/24 link UNICAST
Local:
+-- 0.0.0.0/0 3 0 5
|-- 0.0.0.0
/0 host LOCAL
+-- 127.0.0.0/8 2 0 2
+-- 127.0.0.0/31 1 0 0
|-- 127.0.0.0
/32 link BROADCAST
/8 host LOCAL
|-- 127.0.0.1
/32 host LOCAL
|-- 127.255.255.255
/32 link BROADCAST
+-- 192.168.3.0/24 2 1 2
+-- 192.168.3.0/26 2 0 2
|-- 192.168.3.0
/32 link BROADCAST
|-- 192.168.3.62
/32 host LOCAL
|-- 192.168.3.255
/32 link BROADCAST
And the previous routing test is still correct, 1.1.1.1 and 192.168.3.100 are routed to lo.
ip route get 1.1.1.1
local 1.1.1.1 dev lo table local src 1.1.1.1 uid 0
cache <local>
ip route get 192.168.3.100
local 192.168.3.100 dev lo table local src 192.168.3.100 uid 0
cache <local>
The same RPDB will produce different results! This is even more ridiculous, so I went to have a drink again...
I began to understand that this is a man-made implementation error, not a mistake in my use.
So, I have been at Google for a long time, and in the end I found Commit 0ddcf43d5d4a, which is the root cause of the above problems.
In this patch, in order to improve efficiency, the local table and the main table are merged, and the local table is re-separated from the main table when the custom FIB rule is enabled.
This is in full compliance with the situation I described above, yes, it is caused by this patch.
Then we can discuss it now.
Obviously, merging the main table and the local table breaks the rules of the routing table priority. The local table should be searched independently before the main table.
Perhaps as described in this patch, this can have some performance improvements, but is it worthwhile to have a little bit of performance improvement that undermines the entire routing rule? This is worth thinking about.
Suppose now that a software engineer wants to add 192.168.0.0/16 to the local address, so he naturally executed the following command.
ip route add local 192.168.0.0/16 dev lo
But he does not know that there is a trap in the main table, another overlapping route!
ip route
192.168.3.0/24 dev wlan0 proto kernel scope link src 192.168.3.62 metric 600
ip route get 192.168.1.100
local 192.168.1.100 dev lo src 192.168.1.100 uid 0
cache <local>
ip route get 192.168.3.100
192.168.3.100 dev wlan0 src 192.168.3.62 uid 0
cache
This will cause the entire network of 192.168.3.0/24 not to be routed to the local lo interface as he thought!
This will lead to bugs that are very difficult to find! If I am not a programmer who knows a little about the kernel implementation, I can't find out what caused the problem (I checked a lot of source code and read a lot of patches and kernel mail to solve this problem).
Of course, you can say that no one will modify the local routing table under normal circumstances. But is the linux system also designed for geeks? Is it also designed for programmers who want to exploit the full potential of the system?
If you provide a mechanism to modify the local table, you must ensure that the mechanism is working correctly.
And it's absolutely impossible to make this mechanism a significant difference in the execution process after triggering some incredible conditions (custom FIB rules are enabled, even if they are later disabled).
In summary, I don't think this Commit 0ddcf43d5d4a is a good idea.
Welcome to discuss.
Powered by blists - more mailing lists