netdev - Re: [Questions] Some issues about IPv4/IPv6 nexthop route

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <55dbb48d-dee3-e119-1bdf-edaa080c1c3d@kernel.org>
Date: Thu, 27 Jul 2023 09:35:30 -0600
From: David Ahern <dsahern@...nel.org>
To: Hangbin Liu <liuhangbin@...il.com>
Cc: Stephen Hemminger <stephen@...workplumber.org>,
 Ido Schimmel <idosch@...sch.org>, netdev@...r.kernel.org,
 "David S . Miller" <davem@...emloft.net>, Eric Dumazet
 <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>,
 Paolo Abeni <pabeni@...hat.com>, Thomas Haller <thaller@...hat.com>
Subject: Re: [Questions] Some issues about IPv4/IPv6 nexthop route

On 7/26/23 10:19 PM, Hangbin Liu wrote:
> On Wed, Jul 26, 2023 at 09:57:59AM -0600, David Ahern wrote:
>>> So my questions are, should we show weight/scope for IPv4? How to deal the
>>> type/proto info missing for IPv6? How to deal with the difference of merging
>>> policy for IPv4/IPv6?
>>> + ip route add 172.16.105.0/24 table 100 via 172.16.104.100 dev dummy1
>>> + ip route append 172.16.105.0/24 table 100 via 172.16.104.100 dev dummy2
>>
>>> + ip route add 172.16.106.0/24 table 100 nexthop via 172.16.104.100 dev dummy1 weight 1
>>> + ip route append 172.16.106.0/24 table 100 nexthop via 172.16.104.100 dev dummy1 weight 2
>>
>> Weight only has meaning with a multipath route. In both of these caess
>> these are 2 separate entries in the FIB
> 
> Yes, we know these are 2 separate entries. The NM developers know these
> are 2 separate entries. But the uses don't know, and the route daemon don't
> know. If a user add these 2 entires. And kernel show them as the same. The
> route daemon will store them as a same entries. But if the user delete the
> entry. We actually delete one and left one in the kernel. This will make
> the route daemon and user confused.
> 
> So my question is, should we export the weight/scope? Or stop user add
> the second entry? Or just leave it there and ask route daemon/uses try
> the new nexthop api.
> 
>> with the second one only hit under certain conditions.
> 
> Just curious, with what kind of certain conditions we will hit the second one?

Look at the checks in net/ipv4/fib_trie.c starting at line 1573 (comment
before is "/* Step 3: Process the leaf, if that fails fall back to
backtracing */")

> 
>>
>>> + ip route show table 200
>>> default dev dummy1 scope link
>>> local default dev dummy1 scope host
>>> 172.16.107.0/24 via 172.16.104.100 dev dummy1
>>> 172.16.107.0/24 via 172.16.104.100 dev dummy1
>>>
>>> + ip addr add 2001:db8:101::1/64 dev dummy1
>>> + ip addr add 2001:db8:101::2/64 dev dummy2
>>> + ip route add 2001:db8:102::/64 via 2001:db8:101::10 dev dummy1 table 100
>>> + ip route prepend 2001:db8:102::/64 via 2001:db8:101::10 dev dummy2 table 100
>>> + ip route add local 2001:db8:103::/64 via 2001:db8:101::10 dev dummy1 table 100
>>> + ip route prepend unicast 2001:db8:103::/64 via 2001:db8:101::10 dev dummy2 table 1
>> Unfortunately the original IPv6 multipath implementation did not follow
>> the same semantics as IPv4. Each leg in a MP route is a separate entry
>> and the append and prepend work differently for v6. :-(
>>
>> This difference is one of the many goals of the separate nexthop objects
>> -- aligning ipv4 and ipv6 behavior which can only be done with a new
>> API. There were many attempts to make the legacy route infrastructure
>> more closely aligned between v4 and v6 and inevitably each was reverted
>> because it broke some existing user.
> 
> Yes, I understand the difficult and risk to aligned the v4/v6 behavior.
> On the other hand, changing to new nexthop api also a large work for the
> routing daemons. Here is a quote from NM developers replied to me.

It is some level of work yes, but the netlink message format between old
and new was left as aligned and similar as possible - to make it easier
to move between old and new api.

> 
> "If the issues (this and others) of the netlink API for route objects can be
> fixed, then there seems less reason to change NetworkManager to nexthop
> objects. If it cannot (won't) be fixed, then would be another argument for using
> nexthop objects..."
> 
> I will check if all the issues could be fixed with new nexthop api.
> 
> Thanks
> Hangbin