[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aaddae1d-ad4e-425c-b88a-0830d839a3ce@6wind.com>
Date: Thu, 29 Sep 2022 16:27:49 +0200
From: Nicolas Dichtel <nicolas.dichtel@...nd.com>
To: Julian Anastasov <ja@....bg>
Cc: David Ahern <dsahern@...nel.org>, netdev@...r.kernel.org
Subject: Re: [PATCH net] ip: fix triggering of 'icmp redirect'
Le 27/09/2022 à 14:56, Julian Anastasov a écrit :
>
> Hello,
Hello,
thanks for the detailed explanation.
>
> On Mon, 29 Aug 2022, Nicolas Dichtel wrote:
>
>> __mkroute_input() uses fib_validate_source() to trigger an icmp redirect.
>> My understanding is that fib_validate_source() is used to know if the src
>> address and the gateway address are on the same link. For that,
>> fib_validate_source() returns 1 (same link) or 0 (not the same network).
>> __mkroute_input() is the only user of these positive values, all other
>> callers only look if the returned value is negative.
>>
>> Since the below patch, fib_validate_source() didn't return anymore 1 when
>> both addresses are on the same network, because the route lookup returns
>> RT_SCOPE_LINK instead of RT_SCOPE_HOST. But this is, in fact, right.
>> Let's adapat the test to return 1 again when both addresses are on the same
>> link.
>>
>> CC: stable@...r.kernel.org
>> Fixes: 747c14307214 ("ip: fix dflt addr selection for connected nexthop")
>> Reported-by: kernel test robot <yujie.liu@...el.com>
>> Reported-by: Heng Qi <hengqi@...ux.alibaba.com>
>> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@...nd.com>
>> ---
>>
>> This code exists since more than two decades:
>> https://git.kernel.org/pub/scm/linux/kernel/git/davem/netdev-vger-cvs.git/commit/?id=0c2c94df8133f
>>
>> Please, feel free to comment if my analysis seems wrong.
>>
>> Regards,
>> Nicolas
>>
>> net/ipv4/fib_frontend.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
>> index f361d3d56be2..943edf4ad4db 100644
>> --- a/net/ipv4/fib_frontend.c
>> +++ b/net/ipv4/fib_frontend.c
>> @@ -389,7 +389,7 @@ static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
>> dev_match = dev_match || (res.type == RTN_LOCAL &&
>> dev == net->loopback_dev);
>> if (dev_match) {
>> - ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_HOST;
>> + ret = FIB_RES_NHC(res)->nhc_scope >= RT_SCOPE_LINK;
>
> Looks like I'm late here. nhc_scope is related to the
> nhc_gw. nhc_scope = RT_SCOPE_LINK means rt_gw4 is a target behing a GW
> (nhc_gw). OTOH, RT_SCOPE_HOST means nhc_gw is 0 (missing) or a local IP
> and as result, rt_gw4 is directly connected. IIRC, this should look
> like this:
>
> nhc_gw nhc_scope rt_gw4 fib_scope (route)
> ---------------------------------------------------------------------------
> 0 RT_SCOPE_NOWHERE Host RT_SCOPE_HOST (local)
> 0 RT_SCOPE_HOST LAN_TARGET RT_SCOPE_LINK (link)
> LOCAL1 RT_SCOPE_HOST LAN_TARGET RT_SCOPE_LINK (link)
> REM_GW1 RT_SCOPE_LINK Universe RT_SCOPE_UNIVERSE (indirect)
>
> For the code above: we do not check res->scope,
> we are interested what is the nhc_gw's scope (LINK/HOST/NOWHERE).
> It means, reverse route points back to same device and sender is not
> reached via gateway REM_GW1.
iproute2 reject a gw which is not directly connected, am I wrong?
>
> By changing it to nhc_scope >= RT_SCOPE_LINK, ret always
> will be 1 because nhc_scope is not set below RT_SCOPE_LINK (253).
> Note that RT_SCOPE_HOST is 254.
Do you have a setup which shows the problem?
>
> Looks like calling fib_info_update_nhc_saddr() in
> nh_create_ipv4() with fib_nh->fib_nh_scope (nhc_scope) is a problem,
> it should be fib_nh->fib_nh_scope - 1 or something like that,
> see below. 127.0.0.1 is selected because it is
> ifa_scope = RT_SCOPE_HOST, lo is first device and fib_nh_scope is
> RT_SCOPE_HOST when GW is not provided while creating the nexthop.
>
> About commit 747c14307214:
>
> - if nexthop_create() assigns always nhc_scope = RT_SCOPE_LINK then
> the assumption is that all added gateways are forced to be used
> for routes to universe? If GW is not provided, we should use
> nhc_scope = RT_SCOPE_HOST to allow link routes, right?
> Now, the question is, can I use same nexthop in routes
> with different scope ? What if later I add local IP that matches
> the IP used in nexthop? This nexthop's GW becomes local one.
> But these are corner cases.
>
> What I see as things to change:
>
> - fib_check_nexthop(): "scope == RT_SCOPE_HOST",
> "Route with host scope can not have multiple nexthops".
> Why not? We may need to spread traffic to multiple
> local IPs. But this is old problem and needs more
> investigation.
>
> - as fib_check_nh() is called (and sets nhc_scope) in
> nh_create_ipv4(), i.e. when a nexthop is added, we should
> tune nhc_scope in nh_create_ipv4() by selecting fib_cfg.fc_scope
> based on the present GW and then to provide it to fib_check_nh()
> and fib_info_update_nhc_saddr. As result, this nexthop will get
> valid nhc_scope based on the current IPs and link routes and
> also valid nh_saddr (not 127.0.0.1). We can do it as follows:
>
> .fc_scope = cfg->gw.ipv4 ? RT_SCOPE_UNIVERSE :
> RT_SCOPE_LINK,
>
> As result, we will also fix the scope provided to
> fib_info_update_nhc_saddr which looks like the main
> problem here.
>
> - later, if created route refers to this nexthop,
> fib_check_nexthop() should make sure the provided
> route's scope is not >= the nhc_scope, may be a
> check in nexthop_check_scope() is needed. This
> will ensure that new link route is not using nexthop
> with provided gateway.
>
> - __fib_validate_source() should match for RT_SCOPE_HOST
> as before.
>
> - fib_check_nh_nongw: no GW => RT_SCOPE_HOST
After reverting the two commits (747c14307214 and eb55dc09b5dd) and putting the
below patch, the initial problem is fixed. But it's not clear what is broken
with the current code. Before sending these patches formally, it would be nice
to be able to add a selftest to demonstrate what is wrong.
@@ -2534,7 +2534,7 @@ static int nh_create_ipv4(struct net *net, struct nexthop *nh,
if (!err) {
nh->nh_flags = fib_nh->fib_nh_flags;
fib_info_update_nhc_saddr(net, &fib_nh->nh_common,
- fib_nh->fib_nh_scope);
+ !fib_nh->fib_nh_scope ? 0 : fib_nh->fib_nh_scope - 1);
} else {
fib_nh_release(net, fib_nh);
}
Powered by blists - more mailing lists