netdev - Re: Flows! Offload them.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+mtBx_E5D-ryU=etdKf+7bnYwqHKJbqDc22A5=ypWCL5Z_E1g@mail.gmail.com>
Date:	Thu, 26 Feb 2015 17:52:16 -0800
From:	Tom Herbert <therbert@...gle.com>
To:	Neil Horman <nhorman@...driver.com>
Cc:	Simon Horman <simon.horman@...ronome.com>,
	John Fastabend <john.r.fastabend@...el.com>,
	Thomas Graf <tgraf@...g.ch>, Jiri Pirko <jiri@...nulli.us>,
	Linux Netdev List <netdev@...r.kernel.org>,
	David Miller <davem@...emloft.net>,
	Andy Gospodarek <andy@...yhouse.net>,
	Daniel Borkmann <dborkman@...hat.com>,
	Or Gerlitz <ogerlitz@...lanox.com>,
	Jesse Gross <jesse@...ira.com>, jpettit@...ira.com,
	Joe Stringer <joestringer@...ira.com>,
	Jamal Hadi Salim <jhs@...atatu.com>,
	Scott Feldman <sfeldma@...il.com>,
	Florian Fainelli <f.fainelli@...il.com>,
	Roopa Prabhu <roopa@...ulusnetworks.com>,
	John Linville <linville@...driver.com>,
	Shrijeet Mukherjee <shrijeet@...il.com>,
	Andy Gospodarek <gospo@...ulusnetworks.com>, bcrl@...ck.org
Subject: Re: Flows! Offload them.

On Thu, Feb 26, 2015 at 5:22 PM, Neil Horman <nhorman@...driver.com> wrote:
> On Fri, Feb 27, 2015 at 06:52:58AM +0900, Simon Horman wrote:
>> On Thu, Feb 26, 2015 at 03:16:35PM -0500, Neil Horman wrote:
>> > On Thu, Feb 26, 2015 at 07:23:36AM -0800, John Fastabend wrote:
>> > > On 02/26/2015 05:33 AM, Thomas Graf wrote:
>> > > > On 02/26/15 at 10:16am, Jiri Pirko wrote:
>> > > >> Well, on netdev01, I believe that a consensus was reached that for every
>> > > >> switch offloaded functionality there has to be an implementation in
>> > > >> kernel.
>> > > >
>> > > > Agreed. This should not prevent the policy being driven from user
>> > > > space though.
>> > > >
>> > > >> What John's Flow API originally did was to provide a way to
>> > > >> configure hardware independently of kernel. So the right way is to
>> > > >> configure kernel and, if hw allows it, to offload the configuration to hw.
>> > > >>
>> > > >> In this case, seems to me logical to offload from one place, that being
>> > > >> TC. The reason is, as I stated above, the possible conversion from OVS
>> > > >> datapath to TC.
>> > > >
>> > > > Offloading of TC definitely makes a lot of sense. I think that even in
>> > > > that case you will already encounter independent configuration of
>> > > > hardware and kernel. Example: The hardware provides a fixed, generic
>> > > > function to push up to n bytes onto a packet. This hardware function
>> > > > could be used to implement TC actions "push_vlan", "push_vxlan",
>> > > > "push_mpls". You would you would likely agree that TC should make use
>> > > > of such a function even if the hardware version is different from the
>> > > > software version. So I don't think we'll have a 1:1 mapping for all
>> > > > configurations, regardless of whether the how is decided in kernel or
>> > > > user space.
>> > >
>> > > Just to expand slightly on this. I don't think you can get to a 1:1
>> > > mapping here. One reason is hardware typically has a TCAM and limited
>> > > size. So you need a _policy_ to determine when to push rules into the
>> > > hardware. The kernel doesn't know when to do this and I don't believe
>> > > its the kernel's place to start enforcing policy like this. One thing I likely
>> > > need to do is get some more "worlds" in rocker so we aren't stuck only
>> > > thinking about the infinite size OF_DPA world. The OF_DPA world is only
>> > > one world and not a terribly flexible one at that when compared with the
>> > > NPU folk. So minimally you need a flag to indicate rules go into hardware
>> > > vs software.
>> > >
>> > > That said I think the bigger mismatch between software and hardware is
>> > > you program it differently because the data structures are different. Maybe
>> > > a u32 example would help. For parsing with u32 you might build a parse
>> > > graph with a root and some leaf nodes. In hardware you want to collapse
>> > > this down onto the hardware. I argue this is not a kernel task because
>> > > there are lots of ways to do this and there are trade-offs made with
>> > > respect to space and performance and which table to use when it could be
>> > > handled by a set of tables. Another example is a virtual switch possibly
>> > > OVS but we have others. The software does some "unmasking" (there term)
>> > > before sending the rules into the software dataplane cache. Basically this
>> > > means we can ignore priority in the hash lookup. However this is not how you
>> > > would optimally use hardware. Maybe I should do another write up with
>> > > some more concrete examples.
>> > >
>> > > There are also lots of use cases to _not_ have hardware and software in
>> > > sync. A flag allows this.
>> > >
>> > > My only point is I think we need to allow users to optimally use there
>> > > hardware either via 'tc' or my previous 'flow' tool. Actually in my
>> > > opinion I still think its best to have both interfaces.
>> > >
>> > > I'll go get some coffee now and hopefully that is somewhat clear.
>> >
>> >
>> > I've been thinking about the policy apect of this, and the more I think
>> > about it, the more I wonder if not allowing some sort of common policy in
>> > the kernel is really the right thing to do here.  I know thats somewhat
>> > blasphemous, but this isn't really administrative poilcy that we're
>> > talking about, at least not 100%.  Its more of a behavioral profile that
>> > we're trying to enforce.  That may be splitting hairs, but I think theres
>> > precidence for the latter.  That is to say, we configure qdiscs to limit
>> > traffic flow to certain rates, and configure policies which drop traffic
>> > that violates it (which includes random discard, which is the antithesis
>> > of deterministic policy).  I'm not sure I see this as any different,
>> > espcially if we limit its scope.  That is to say, why couldn't we allow
>> > the kernel to program a predetermined set of policies that the admin can
>> > set (i.e. offload routing to a hardware cache of X size with an lru
>> > victimization).  If other well defined policies make sense, we can add
>> > them and exposes options via iproute2 or some such to set them.  For the
>> > use case where such pre-packaged policies don't make sense, we have
>> > things like the flow api to offer users who want to be able to control
>> > their hardware in a more fine grained approach.
>>
>> In general I agree that it makes sense to have have sane offload policy
>> in the kernel and provide a mechanism to override that. Things that already
>> work should continue to work: just faster or with fewer CPU cycles consumed.
>>
> Yes, exactly that, for the general traditional networking use case, that is
> exactly what we want, to opportunistically move traffic faster with less load on
> the cpu.  We don't nominally care what traffic is offloaded, as long as the
> hardware does a better job than just software alone.  If we get an occasional
> miss and have to do stuff in software, so be it.
>
+1 on an in kernel "Network Resource Manager". This also came up in
Sunil's plan to configure RPS affinities from a driver so I'm taking
liberty by generalizing the concept :-).

>> I am, however, not entirely convinced that it is always possible to
>> implement such a sane default policy that is worth the code complexity -
>> I'm thinking in particular of Open vSwitch where management of flows is
>> already in user-space.
> So, this is a case in which I think John F.'s low level flow API is more well
> suited.  OVS has implemented a user space dataplane that circumvents alot of the
> kernel mechanisms for traffic forwarding.  For that sort of application, the
> traditional kernel offload "objects" aren't really appropriate.  Instead, OVS
> can use the low level flow API to construct its own custom offload pipeline
> using whatever rules and policies that it wants.
>
> Of course, using the low level flow API is incompatible with the in-kernel
> object offload idea that I'm proposing, but I see the two as able to co-exist,
> much like firewalld co-exists with iptables.  You can use both, but you have to
> be aware that using the lower layer interface might break the others higher
> level oeprations.  And if that happens, its on you to manage it.
>
> Best
> Neil
>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html