[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <480C889B.3060203@odu.neva.ru>
Date: Mon, 21 Apr 2008 16:29:15 +0400
From: Dmitry Butskoy <buc@...sz.so-cdu.ru>
To: netdev@...r.kernel.org
Subject: [ROUTE]: FIB_RES_PREFSRC() selects wrong source in some cases
Consider an interface with two (or more) IP addresses, connected to a
LAN segment with two (or more) networks. In such a case the source IP is
not unique, it should be chosen depending on the destionation IP. Under
some circumstances this choice is incorrect.
Consider the example (an interface in two networks, trying to reach
"scope link" destinations):
> # ifdown eth0
> # ip link set up dev eth0
> # ip addr add 192.168.0.1/24 dev eth0
> # ip addr add 172.18.0.1/24 dev eth0
> # ip route show dev eth0
> 172.18.0.0/24 proto kernel scope link src 172.18.0.1
> 192.168.0.0/24 proto kernel scope link src 192.168.0.1
now we have two routes with preferred src specified
> # ip route get 192.168.0.2
> 192.168.0.2 dev eth0 src 192.168.0.1
> cache mtu 1500 advmss 1460 hoplimit 64
> # ip route get 172.18.0.2
> 172.18.0.2 dev eth0 src 172.18.0.1
> cache mtu 1500 advmss 1460 hoplimit 64
Becasue of the preferred src, the actual source IP is chosen right.
Now let's flush all the routes, and then add them manually.
(Certainly such a usage is a corner case, but sometimes some admins
prefer to set all the routes explicitly, rather than implicitly by
"proto kernel" etc.)
> # ip route flush dev eth0
> #
> # ip route add 192.168.0.0/24 dev eth0
> # ip route add 172.18.0.0/24 dev eth0
> # ip route show dev eth0
> 172.18.0.0/24 scope link
> 192.168.0.0/24 scope link
Now the same as above, but no more preferred src...
> # ip route get 192.168.0.2
> 192.168.0.2 dev eth0 src 192.168.0.1
> cache mtu 1500 advmss 1460 hoplimit 64
right...
> # ip route get 172.18.0.2
> 172.18.0.2 dev eth0 src 192.168.0.1
> cache mtu 1500 advmss 1460 hoplimit 64
Oops. Now wrong.
...and set up the test iface back:
> # ip addr flush dev eth0
> # ip link set down dev eth0
> # ifup eth0
In summary: we have an interface with two IP: 192.168.0.1 and
172.18.0.1, and have:
172.18.0.0/24 scope link
192.168.0.0/24 scope link
routes. The actual "src" is chosen wrong.
The correspond kernel code fragment seems to be in
net/ipv4/route.c:ip_route_output_slow():
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.25.y.git;a=blob;f=net/ipv4/route.c;h=7b5e8e1d94be2eb19764880c99597854f34657f3;hb=HEAD#l2419
> if (!fl.fl4_src)
> fl.fl4_src = FIB_RES_PREFSRC(res);
The FIB_RES_PREFSR macro is:
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.25.y.git;a=blob;f=include/net/ip_fib.h;h=8b12667f7a2bccad312fc0e63e679dfd02ec3509;hb=HEAD#l140
> #define FIB_RES_PREFSRC(res) ((res).fi->fib_prefsrc ? : __fib_res_prefsrc(&res))
since preferred source is not set in the example above, the
"fib_prefsrc" is zero, hence __fib_res_prefsrc() is called, which is
actually:
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.25.y.git;a=blob;f=net/ipv4/fib_semantics.c;h=a13c84763d4c10e503e418f2e6afef3e353193c3;hb=HEAD#l941
> __be32 __fib_res_prefsrc(struct fib_result *res)
> {
> return inet_select_addr(FIB_RES_DEV(*res), FIB_RES_GW(*res), res->scope);
> }
and since we worked with the "scope link" destinations and have not
specified any "gateways" for it, the "FIB_RES_GW(*res)" is zero.
When inet_select_addr():
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.25.y.git;a=blob;f=net/ipv4/devinet.c;h=87490f7bb0f72a47db2a3e8a28ce3816cd236032;hb=HEAD#l877
> __be32 inet_select_addr(const struct net_device *dev, __be32 dst, int scope)
> {
is called with "dst == 0", it chose just the first IP seen on the
interface (i.e. 192.168.0.1 in the example) ...
Whether it is possible (and applicable) to change the code someway, to
call inet_select_addr() with the proper destination IP ?
Actually, it is a long standing issue (at least since 1999), probably
it is even "feature" now :), but it seems strange that the kernel have
all the data to make the right choice, but does not any attemptfor it...
Regards,
Dmitry Butskoy
http://www.fedoraproject.org/wiki/DmitryButskoy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists