netdev - Re: [RFC] implicit per-namespace devlink instance to set kernel resource limitations

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190807114918.15e10047@cakuba.netronome.com>
Date:   Wed, 7 Aug 2019 11:49:18 -0700
From:   Jakub Kicinski <jakub.kicinski@...ronome.com>
To:     David Ahern <dsahern@...il.com>
Cc:     Andrew Lunn <andrew@...n.ch>, Jiri Pirko <jiri@...nulli.us>,
        netdev@...r.kernel.org, davem@...emloft.net, mlxsw@...lanox.com,
        f.fainelli@...il.com, vivien.didelot@...il.com, mkubecek@...e.cz,
        stephen@...workplumber.org, daniel@...earbox.net,
        brouer@...hat.com, eric.dumazet@...il.com
Subject: Re: [RFC] implicit per-namespace devlink instance to set kernel
 resource limitations

On Tue, 6 Aug 2019 20:33:47 -0600, David Ahern wrote:
> Some time back supported was added for devlink 'resources'. The idea is
> that hardware (mlxsw) has limited resources (e.g., memory) that can be
> allocated in certain ways (e.g., kvd for mlxsw) thus implementing
> restrictions on the number of programmable entries (e.g., routes,
> neighbors) by userspace.
> 
> I contend:
> 
> 1. The kernel is an analogy to the hardware: it is programmed by
> userspace, has limited resources (e.g., memory), and that users want to
> control (e.g., limit) the number of networking entities that can be
> programmed - routes, rules, nexthop objects etc and by address family
> (ipv4, ipv6).

Memory hierarchy for ASIC is more complex and changes more often than
we want to change the model and kernel ABIs. The API in devlink is
intended for TCAM partitioning.

> 2. A consistent operational model across use cases - s/w forwarding, XDP
> forwarding and hardware forwarding - is good for users deploying systems
> based on the Linux networking stack. This aligns with my basic point at
> LPC last November about better integration of XDP and kernel tables.
> 
> The existing devlink API is the right one for all use cases. Most
> notably that the kernel can mimic the hardware from a resource
> management. Trying to say 'use cgroups for s/w forwarding and devlink
> for h/w forwarding' is complicating the lives of users. It is just a
> model and models can apply to more than some rigid definition.

This argument holds no water. Only a tiny fraction of Linux networking
users will have an high performance forwarding ASIC attached to their
CPUs. So we'll make 99.9% of users who never seen devlink learn the
tool for device control to control kernel resource?

Perhaps I'm misinterpreting your point there.

> As for the namespace piece of this, the kernel's tables for networking
> are *per namespace*, and so the resource controller must be per
> namespace. This aligns with another consistent theme I have promoted
> over the years - the ability to divide up a single ASIC into multiple,
> virtual switches which are managed per namespace. This is a very popular
> feature from a certain legacy vendor and one that would be good for open
> networking to achieve. This is the basis of my response last week about
> the devlink instance per namespace, and I thought Jiri was moving in
> that direction until our chat today. Jiri's intention is something
> different; we can discuss that on the next version of his patches.

Resource limits per namespace make perfect sense. Just not configured
via devlink..