[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <61520dad-939f-46ff-626b-dea91b845aa3@gmail.com>
Date: Mon, 25 Mar 2019 11:02:07 -0600
From: David Ahern <dsahern@...il.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: David Miller <davem@...emloft.net>, netdev@...r.kernel.org,
edumazet@...gle.com
Subject: Re: [PATCH net-next] ipv6: Move ipv6 stubs to a separate header file
On 3/24/19 9:26 PM, Alexei Starovoitov wrote:
> On Sun, Mar 24, 2019 at 06:56:42AM -0600, David Ahern wrote:
>>
>> This change also enables many other key features:
>> 1. IPv4 multipath routes are not evicted just because 1 hop goes down.
>> 2. IPv6 multipath routes with device only nexthops (e.g., tunnels).
>> 3. IPv6 nexthop with IPv4 route (aka, RFC 5549) which enables a more
>> natural BGP unnumbered.
>> 4. Lower memory consumption for IPv6 FIB entries which has no sharing at
>> all like IPv4 does.
>> 5. Allows atomic update of nexthop definitions with a single replace
>> command as opposed to replacing the N-routes using it.
>
> Does kernel work as data plane or control plane in any of the above
> features ?
> Sadly the patches allow it to do both, but cumulus doesn't use it
> for data path. The kernel on control plane cpu is merely a database.
> And today it doesn't scale when used as a database.
> The kernel has to be fast as a dataplane but these extra features
> will slow down the routing by making kernel-as-database scale a bit better.
> Hence my suggestion in the previous email: use proper database
> to store routes, nexthops and whatever else necessary to program the asic.
> The kernel doesn't need to hold this information.
>
The first 40 patches align fib_nh and fib6_nh providing more consistency
and alignment between IPv4 and IPv6 and allowing more code re-use
between the protocols. The end result is the ability to have IPv6
gateways with IPv4 routes, a much needed control plane feature other
companies have been harassing me about as well as the internal need for
Cumulus. In the refactoring I have been very careful about changes to
data structure layout and cacheline hits as well as adverse changes to
memory use. I believe at the end of this change set there is no impact
to existing performance - control plane or data plane.
That is followed by refactoring IPv6 again in a direction that makes
IPv4 and IPv6 more consistent and enables changes (outside of the
nexthop sets) that will improve IPv6 for a number of cases by removing
the need to always generate a dst_entry.
After that are a few patches exporting functions for use by nexthop code
and then diving into the refactoring enabling separate nexthop objects.
Again, impacts to performance have been top of mind, and I have done
what I can to minimize any overhead in the datapath - to the point of a
few ‘if (nh)’ checks wrapped in an unlikely. And with the nexthop code
in place it gives users an alternative to a broken IPv6 multipath API as
one example.
As far as scalability goes, I can already inject a million routes into
the kernel FIB. This allows me to it more efficiently and to manage the
FIBs more efficiently in the face of changes such as a link going down
as we move to higher end systems - such as spectrum2.
As for routes in the kernel, they need to be there for any control plane
processes to properly function. One example is ping and traceroute to
troubleshoot data path problems, and another is for bgp (or any other
service) to connect to a peer through the data plane (do not assume a
peer is on a directly connected route). Further, the routes need to go
through the kernel to get to the switchdev driver. The routes need to be
there for XDP forwarding and routing on the host. Pawel has already
expressed interest in using XDP for fast path forwarding with FRR
managing the route table.
You keep trying to make this about Cumulus. This is about bringing next
level features to Linux and in the process bringing more consistency and
code sharing between IPv4 and IPv6. This is about 1-API for the data
center be it servers, hosts, switches or routers regardless of datapath
(hardware offload, XDP, or kernel forwarding), and maintaining
consistency in configuring, monitoring and troubleshooting across those
systems. That is the common theme of both the netdev talk last summer
and the talk at LPC in November.
Again, I have tried to be very careful with the intrusion of checks into
the datapath with the goal of no measurable impact to performance. I am
invested to seeing that through and will continue looking for ways to
improve it for all use cases.
Powered by blists - more mailing lists