[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e0047c07-11a0-423c-9560-3806328a0d76@gmail.com>
Date: Tue, 6 Aug 2019 20:33:47 -0600
From: David Ahern <dsahern@...il.com>
To: Andrew Lunn <andrew@...n.ch>
Cc: Jiri Pirko <jiri@...nulli.us>, netdev@...r.kernel.org,
davem@...emloft.net, mlxsw@...lanox.com,
jakub.kicinski@...ronome.com, f.fainelli@...il.com,
vivien.didelot@...il.com, mkubecek@...e.cz,
stephen@...workplumber.org, daniel@...earbox.net,
brouer@...hat.com, eric.dumazet@...il.com,
Jakub Kicinski <jakub.kicinski@...ronome.com>
Subject: Re: [RFC] implicit per-namespace devlink instance to set kernel
resource limitations
Some time back supported was added for devlink 'resources'. The idea is
that hardware (mlxsw) has limited resources (e.g., memory) that can be
allocated in certain ways (e.g., kvd for mlxsw) thus implementing
restrictions on the number of programmable entries (e.g., routes,
neighbors) by userspace.
I contend:
1. The kernel is an analogy to the hardware: it is programmed by
userspace, has limited resources (e.g., memory), and that users want to
control (e.g., limit) the number of networking entities that can be
programmed - routes, rules, nexthop objects etc and by address family
(ipv4, ipv6).
2. A consistent operational model across use cases - s/w forwarding, XDP
forwarding and hardware forwarding - is good for users deploying systems
based on the Linux networking stack. This aligns with my basic point at
LPC last November about better integration of XDP and kernel tables.
The existing devlink API is the right one for all use cases. Most
notably that the kernel can mimic the hardware from a resource
management. Trying to say 'use cgroups for s/w forwarding and devlink
for h/w forwarding' is complicating the lives of users. It is just a
model and models can apply to more than some rigid definition.
As for the namespace piece of this, the kernel's tables for networking
are *per namespace*, and so the resource controller must be per
namespace. This aligns with another consistent theme I have promoted
over the years - the ability to divide up a single ASIC into multiple,
virtual switches which are managed per namespace. This is a very popular
feature from a certain legacy vendor and one that would be good for open
networking to achieve. This is the basis of my response last week about
the devlink instance per namespace, and I thought Jiri was moving in
that direction until our chat today. Jiri's intention is something
different; we can discuss that on the next version of his patches.
###
As for the current controller put into netdevsim...
When I started down this road 18-20 months ago, I was copying a lot of
netdevsim code to create a fake device from which I could have a devlink
instance to implement the devlink resources. At some point it was silly
to keep duplicating the code - just make it part of netdevsim. After all
it really mirrors mlxsw and the resource limits for fib notifier
handling, it allows testing of the userspace APIs and in kernel notifier
APIs which allow an entity to veto a change. This is all consistent with
the intent of netdevsim - s/w based implementation for testing of APIs
that otherwise require hardware.
Powered by blists - more mailing lists