[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5ffd6f4e-fe3e-0f47-1b7f-8efca07f74d3@nic.cz>
Date: Tue, 11 Dec 2018 13:52:20 +0100
From: Jan Maria Matejka <jan.matejka@....cz>
To: David Ahern <dsahern@...il.com>
Cc: linux-netdev <netdev@...r.kernel.org>,
Ondrej Zajicek <santiago@...reenet.org>,
David Miller <davem@...emloft.net>
Subject: Re: [PATCH RFC net-next 00/18] net: Improve route scalability via
support for nexthop objects
Hello!
On 9/4/18 5:57 PM, David Ahern wrote:
> On 9/2/18 11:34 AM, David Miller wrote:
>> From: dsahern@...nel.org
>> Date: Fri, 31 Aug 2018 17:49:35 -0700
>>
>>> Examples
>>> 1. Single path
>>> $ ip nexthop add id 1 via 10.99.1.2 dev veth1
>>> $ ip route add 10.1.1.0/24 nhid 1
>>>
>>> $ ip next ls
>>> id 1 via 10.99.1.2 src 10.99.1.1 dev veth1 scope link
>>>
>>> $ ip ro ls
>>> 10.1.1.0/24 nhid 1 scope link
>>> ...
>>
>> First of all, this whole idea is awesome! But, you knew that already. :)
>
> :-)
Joining the wow's. This seems that it may save valuable kernel table
synchronization time in BIRD.
>> However, I worry what happesn in a mixed environment where we have routing
>> daemons and tools inserting nexthop based routes, and some doing things
>> the old way using and expecting inline nexthop information in the routes.
>>
>> That mixed environment situation has to function correctly. Older
>> apps have to see the per-route nexthop info in the format and layout
>> they expect (gw/dev pairs). They cannot be expected to just studdenly
>> understand the nexthop ID etc.
>>
>> Otherwise the concept and ideas are fine, so as long as you can resolve
>> the mixed environment situation I fully support this work and look forward
>> to it being in a state where I can integrate it :-)
>>
>
> RTA_NH_ID is on par with other new attriwo parallel commands doing butes (RTA_ENCAP for example) --
> userspace apps get a route attribute and have no idea what it means
> until support is added (e.g., it took more than 2 years for libnl to get
> support for RTA_ENCAP). I take your comment to mean you prefer this new
> attribute to be treated differently -- assume apps are clueless unless
> they indicate otherwise. Given the number of ioctl based apps that might
> be the better option for this case.
>
> I can add an attribute for apps to specify 'hey, I understand nexthops'
> on dump and get requests (per-app flag), and then I can add a sysctl
> that controls whether the nexthop spec is included. The sysctl would be
> for notifications and a global option for dumps/gets. Users who know
> their OS is safe for the short form can set it and get the benefit of
> smaller messages. While the biggest win here is pushing routes to the
> kernel faster, there is also a gain with less data from the kernel in
> route dumps and notifications, especially with multipath environments.
Personally, I use mixed environments where iproute2 version doesn't match
kernel version most of the time and I'd like to use both new iproute2 and
old BIRD or whatever at the same machine. For myself, not knowing exactly
how to implement it, the best way how to interact with the nexthop layer,
would be a flag set while opening the netlink socket, or maybe another
RTMGRP_* like RTMGRP_IPV4_NEXTHOP and RTMGRP_IPV6_NEXTHOP. Who asks for
these, they will get nexthop-ed notifications; who asks for
RTMGRP_IPV[46]_ROUTE, gets them the old way. Anyway, I may live quite well
with a system-wide knob in sysctl setting this.
I'd also like to have documented that the multipath is forbidden to be
recursive and only one level of depth is allowed. It seems OK for me on
first sight how it works, just please write it down for future generations
for them to say "it is OK to fail badly when kernel sends us a recursive
nexthop group definition"
At last, what I consider important, is a namespace separation by
nh_protocol. Let's suppose that there is a routing daemon maintaining some
routes and also an operator who has her own routes. Then the operator does
a stupid typo while running ip nexthop replace and instead of replacing
her own nexthop, she replaces the routing daemon's nexthop.
I suggest that the key in the nexthop table should include nh_protocol
together with nh_id to split these records. It also seems that there is
only one nexthop table; is this OK wrt. multiple VRF's and so? There is
also the nh_table parameter in struct nh_config which only validates the
gateway in a given table. If I look correctly, this nexthop may be then
used anywhere which is insane from my point of view.
BTW, let's consider this:
# ip nexthop add id 1 via 10.99.0.1 dev veth1
# ip nexthop add id 2 via 10.99.0.2 dev veth1
# ip nexthop add id 3 via 10.99.0.3 dev veth1
and then both commands at the same time:
# ip nexthop replace id 1 group 2/3
# ip nexthop replace id 2 group 1/3
Is this synchronized and one of these commands fails? (I'm not familiar
with the code such deeply, sorry if I'm asking for something obvious.)
Thank you for considering my objections and sorry for reviving such an
old thread.
Maria
BIRD developer
Powered by blists - more mailing lists