netdev - Re: Flows! Offload them.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150226133326.GC23050@casper.infradead.org>
Date:	Thu, 26 Feb 2015 13:33:26 +0000
From:	Thomas Graf <tgraf@...g.ch>
To:	Jiri Pirko <jiri@...nulli.us>
Cc:	Simon Horman <simon.horman@...ronome.com>, netdev@...r.kernel.org,
	davem@...emloft.net, nhorman@...driver.com, andy@...yhouse.net,
	dborkman@...hat.com, ogerlitz@...lanox.com, jesse@...ira.com,
	jpettit@...ira.com, joestringer@...ira.com,
	john.r.fastabend@...el.com, jhs@...atatu.com, sfeldma@...il.com,
	f.fainelli@...il.com, roopa@...ulusnetworks.com,
	linville@...driver.com, shrijeet@...il.com,
	gospo@...ulusnetworks.com, bcrl@...ck.org
Subject: Re: Flows! Offload them.

On 02/26/15 at 10:16am, Jiri Pirko wrote:
> Well, on netdev01, I believe that a consensus was reached that for every
> switch offloaded functionality there has to be an implementation in
> kernel.

Agreed. This should not prevent the policy being driven from user
space though.

> What John's Flow API originally did was to provide a way to
> configure hardware independently of kernel. So the right way is to
> configure kernel and, if hw allows it, to offload the configuration to hw.
> 
> In this case, seems to me logical to offload from one place, that being
> TC. The reason is, as I stated above, the possible conversion from OVS
> datapath to TC.

Offloading of TC definitely makes a lot of sense. I think that even in
that case you will already encounter independent configuration of
hardware and kernel. Example: The hardware provides a fixed, generic
function to push up to n bytes onto a packet. This hardware function
could be used to implement TC actions "push_vlan", "push_vxlan",
"push_mpls". You would you would likely agree that TC should make use
of such a function even if the hardware version is different from the
software version. So I don't think we'll have a 1:1 mapping for all
configurations, regardless of whether the how is decided in kernel or
user space.

My primiary concern of *only* allowing to decide how to program the
hardware in the kernel is the lack of context; A given L3/L4 software
pipeline in the Linux kernel consists of various subsystems: tc
ingress, linux bridge, various iptables chains, routing rules, routing
tables, tc egress, etc. All of them can be stacked in almost unlimited
combinations using virtual software devices and segmented using
net namespaces.

Given this complexity we'll most likely have to solve some of it with
a flag to control offloading (as already introduced for bridging) and
allow the user to shoot himself in the foot (as Jamal and others
pointed out a couple of times). I currently don't see how the kernel
could *always* get it right automatically. We need some additional
input from the user (See also Patrick's comments regarding iptables
offload)

However, for certain datacenter server use cases we actually have the
full user intent in user space as we configure all of the kernel
subsystems from a single central management agent running locally
on the server (OpenStack, Kubernetes, Mesos, ...), i.e. we do know
exactly what the user wants on the system as a whole. This intent is
then split into small configuration pieces to configure iptables, tc,
routes on multiple net namespaces (for example to implement VRF).

E.g. A VRF in software would make use of net namespaces which holds
tenant specific ACLs, routes and QoS settings. A separate action
would fwd packets to the namespace. Easy and straight forward in
software. OTOH, the hardware, capable of implementing the ACLs,
would also need to know about the tc action which selected the
namespace when attempting to offload the ACL as it would otherwise
ACLs to wrong packets.

I would love to have the possibility to make use of that rich intent
avaiable in user space to program the hardware in combination with
configuring the kernel.

Would love to hear your thoughts on this. I think we all share the same
goal which is to have in-kernel drivers for chips which can perform
advanced switching and support it natively with Linux and have it
become the de-facto standard for both hardware switch management and
compute servers.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html