netdev - Re: Flows! Offload them.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <54EF8BFB.5050608@intel.com>
Date:	Thu, 26 Feb 2015 13:11:23 -0800
From:	John Fastabend <john.r.fastabend@...el.com>
To:	Neil Horman <nhorman@...driver.com>
CC:	Thomas Graf <tgraf@...g.ch>, Jiri Pirko <jiri@...nulli.us>,
	Simon Horman <simon.horman@...ronome.com>,
	netdev@...r.kernel.org, davem@...emloft.net, andy@...yhouse.net,
	dborkman@...hat.com, ogerlitz@...lanox.com, jesse@...ira.com,
	jpettit@...ira.com, joestringer@...ira.com, jhs@...atatu.com,
	sfeldma@...il.com, f.fainelli@...il.com, roopa@...ulusnetworks.com,
	linville@...driver.com, shrijeet@...il.com,
	gospo@...ulusnetworks.com, bcrl@...ck.org
Subject: Re: Flows! Offload them.

On 02/26/2015 12:16 PM, Neil Horman wrote:
> On Thu, Feb 26, 2015 at 07:23:36AM -0800, John Fastabend wrote:
>> On 02/26/2015 05:33 AM, Thomas Graf wrote:
>>> On 02/26/15 at 10:16am, Jiri Pirko wrote:
>>>> Well, on netdev01, I believe that a consensus was reached that for every
>>>> switch offloaded functionality there has to be an implementation in
>>>> kernel.
>>>
>>> Agreed. This should not prevent the policy being driven from user
>>> space though.
>>>
>>>> What John's Flow API originally did was to provide a way to
>>>> configure hardware independently of kernel. So the right way is to
>>>> configure kernel and, if hw allows it, to offload the configuration to hw.
>>>>
>>>> In this case, seems to me logical to offload from one place, that being
>>>> TC. The reason is, as I stated above, the possible conversion from OVS
>>>> datapath to TC.
>>>
>>> Offloading of TC definitely makes a lot of sense. I think that even in
>>> that case you will already encounter independent configuration of
>>> hardware and kernel. Example: The hardware provides a fixed, generic
>>> function to push up to n bytes onto a packet. This hardware function
>>> could be used to implement TC actions "push_vlan", "push_vxlan",
>>> "push_mpls". You would you would likely agree that TC should make use
>>> of such a function even if the hardware version is different from the
>>> software version. So I don't think we'll have a 1:1 mapping for all
>>> configurations, regardless of whether the how is decided in kernel or
>>> user space.
>>
>> Just to expand slightly on this. I don't think you can get to a 1:1
>> mapping here. One reason is hardware typically has a TCAM and limited
>> size. So you need a _policy_ to determine when to push rules into the
>> hardware. The kernel doesn't know when to do this and I don't believe
>> its the kernel's place to start enforcing policy like this. One thing I likely
>> need to do is get some more "worlds" in rocker so we aren't stuck only
>> thinking about the infinite size OF_DPA world. The OF_DPA world is only
>> one world and not a terribly flexible one at that when compared with the
>> NPU folk. So minimally you need a flag to indicate rules go into hardware
>> vs software.
>>
>> That said I think the bigger mismatch between software and hardware is
>> you program it differently because the data structures are different. Maybe
>> a u32 example would help. For parsing with u32 you might build a parse
>> graph with a root and some leaf nodes. In hardware you want to collapse
>> this down onto the hardware. I argue this is not a kernel task because
>> there are lots of ways to do this and there are trade-offs made with
>> respect to space and performance and which table to use when it could be
>> handled by a set of tables. Another example is a virtual switch possibly
>> OVS but we have others. The software does some "unmasking" (there term)
>> before sending the rules into the software dataplane cache. Basically this
>> means we can ignore priority in the hash lookup. However this is not how you
>> would optimally use hardware. Maybe I should do another write up with
>> some more concrete examples.
>>
>> There are also lots of use cases to _not_ have hardware and software in
>> sync. A flag allows this.
>>
>> My only point is I think we need to allow users to optimally use there
>> hardware either via 'tc' or my previous 'flow' tool. Actually in my
>> opinion I still think its best to have both interfaces.
>>
>> I'll go get some coffee now and hopefully that is somewhat clear.
> 
> 
> I've been thinking about the policy apect of this, and the more I think about
> it, the more I wonder if not allowing some sort of common policy in the kernel
> is really the right thing to do here.  I know thats somewhat blasphemous, but
> this isn't really administrative poilcy that we're talking about, at least not
> 100%.  Its more of a behavioral profile that we're trying to enforce.  That may
> be splitting hairs, but I think theres precidence for the latter.  That is to
> say, we configure qdiscs to limit traffic flow to certain rates, and configure
> policies which drop traffic that violates it (which includes random discard,
> which is the antithesis of deterministic policy).  I'm not sure I see this as
> any different, espcially if we limit its scope.  That is to say, why couldn't we
> allow the kernel to program a predetermined set of policies that the admin can
> set (i.e. offload routing to a hardware cache of X size with an lru
> victimization).  If other well defined policies make sense, we can add them and
> exposes options via iproute2 or some such to set them.  For the use case where
> such pre-packaged policies don't make sense, we have things like the flow api to
> offer users who want to be able to control their hardware in a more fine grained
> approach.
> 
> Neil
> 

Hi Neil,

I actually like this idea a lot. I might tweak a bit in that we could have some
feature bits or something like feature bits that expose how to split up the
hardware cache and give sizes.

So the hypervisor (see I think of end hosts) or administrators could come in and
say I want a route table and a nft table. This creates a "flavor" over how the
hardware is going to be used. Another use case may not be doing routing at all
but have an application that wants to manage the hardware at a more fine grained
level with the exception of some nft commands so it could have a "nft"+"flow"
flavor. Insert your favorite use case here.

.John
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html