[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180705121202.GA27625@Nover>
Date: Thu, 5 Jul 2018 14:12:03 +0200
From: Paul Chaignon <paul.chaignon@...il.com>
To: William Tu <u9012063@...il.com>
Cc: Alexei Starovoitov <alexei.starovoitov@...il.com>,
"<dev@...nvswitch.org>" <dev@...nvswitch.org>,
Tom Herbert via iovisor-dev <iovisor-dev@...ts.iovisor.org>,
Linux Kernel Network Developers <netdev@...r.kernel.org>
Subject: Re: [iovisor-dev] [RFC PATCH 00/11] OVS eBPF datapath.
On Wed, Jul 04, 2018 at 07:25:50PM -0700, William Tu wrote:
> On Tue, Jul 3, 2018 at 10:56 AM, Alexei Starovoitov
> <alexei.starovoitov@...il.com> wrote:
> > On Thu, Jun 28, 2018 at 07:19:35AM -0700, William Tu wrote:
> >> Hi Alexei,
> >>
> >> Thanks a lot for the feedback!
> >>
> >> On Wed, Jun 27, 2018 at 8:00 PM, Alexei Starovoitov
> >> <alexei.starovoitov@...il.com> wrote:
> >> > On Sat, Jun 23, 2018 at 05:16:32AM -0700, William Tu wrote:
> >> >>
> >> >> Discussion
> >> >> ==========
> >> >> We are still actively working on finishing the feature, currently
> >> >> the basic forwarding and tunnel feature work, but still under
> >> >> heavy debugging and development. The purpose of this RFC is to
> >> >> get some early feedbacks and direction for finishing the complete
> >> >> features in existing kernel's OVS datapath (the net/openvswitch/*).
> >> >
> >> > Thank you for sharing the patches.
> >> >
> >> >> Three major issues we are worried:
> >> >> a. Megaflow support in BPF.
> >> >> b. Connection Tracking support in BPF.
> >> >
> >> > my opinion on the above two didn't change.
> >> > To recap:
> >> > A. Non scalable megaflow map is no go. I'd like to see packet classification
> >> > algorithm like hicuts or efficuts to be implemented instead, since it can be
> >> > shared by generic bpf, bpftiler, ovs and likely others.
> >>
> >> We did try the decision tree approach using dpdk's acl lib. The lookup
> >> speed is 6 times faster than the magaflow using tuple space.
> >> However, the update/insertion requires rebuilding/re-balancing the decision
> >> tree so it's way too slow. I think hicuts or efficuts suffers the same issue.
> >> So decision tree algos are scalable only for lookup operation due to its
> >> optimization over tree depth, but not scalable under
> >> update/insert/delete operations.
> >>
> >> On customer's system we see megaflow update/insert rate around 10 rules/sec,
> >> this makes decision tree unusable, unless we invent something to optimize the
> >> update/insert time or incremental update of these decision tree algo.
> >
> > is this a typo? you probably meant 10K rule updates a second ?
> I mean "new" rules being added at 10 rules/sec.
> Update rate might be much higher.
>
> > Last time I've dealt with these algorithms we had 100K acl updates a second.
> > It was an important metric that we were optimizing for.
> > I'm pretty sure '*cuts' algos do many thousands per second non optimized.
>
> When adding a new rule, do these algorithms require rebuilding the
> entire tree?
>
> In our evaluation, updating an existing entry in the decision tree
> performs OK, because it is equal to lookup and replace, and lookup
> is fast, update is just atomic swap. But inserting a new rule is slow,
> because it requires re-building the tree using all existing rules.
> And we see new rule being added at rate 10 rules per second.
> So we are constantly rebuilding the entire tree.
>
> If the entire tree has 100k of rules, it takes around 2 seconds to rebuild,
> based on the dpdk acl library. And without an incremental algorithm,
> adding 1 new rule will trigger rebuilding the 100k of rules, and it is too slow.
>
> Reading through HyperCuts and EffiCuts, I'm not sure how it supports
> incrementally adding a new rule, without rebuilding the entire tree.
> http://ccr.sigcomm.org/online/files/p207.pdf
> http://cseweb.ucsd.edu/~susingh/papers/hyp-sigcomm03.pdf
>
> The HyperCuts papers says
> "A fast update algorithm can also be implemented; however we do not
> go into the details of incremental update in this paper"
>
> >
> >> >> c. Verifier limitation.
> >> >
> >> > Not sure what limitations you're concerned about.
> >> >
> >>
> >> Mostly related to stack. The flow key OVS uses (struct sw_flow_key)
> >> is 464 byte. We trim a lot, now around 300 byte, but still huge, considering
> >> the BPF's stack limit is 512 byte.
> >
> > have you tried using per-cpu array of one element with large value
> > instead of stack?
>
> yes, now we store the flow key in percpu array with 1 element.
>
> > In the latest verifier most of the operations that can be done with the stack
> > pointer can be done with pointer to map value too.
> >
> Once the flow key is stored in map, another eBPF program
> needs to use that key to lookup flow table (another map).
> So we have to store the flow key on stack first, in order to
> use it as key to lookup the flow table map.
>
> Is there a way to work around it?
d71962f ("bpf: allow map helpers access to map values directly") removes
that limitation from the verifier and should allow you to use map values
as map keys directly. 4.18-rc1 has it.
> Thanks
> William
Powered by blists - more mailing lists