[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170830041146.q2b52d77x2gwyhco@ast-mbp>
Date: Tue, 29 Aug 2017 21:11:48 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: David Ahern <dsahern@...il.com>
Cc: netdev@...r.kernel.org, daniel@...earbox.net, ast@...nel.org,
tj@...nel.org, davem@...emloft.net, luto@...capital.net
Subject: Re: [PATCH v2 net-next 1/8] bpf: Add support for recursively running
cgroup sock filters
On Tue, Aug 29, 2017 at 09:38:16PM -0600, David Ahern wrote:
> On 8/29/17 8:58 PM, Alexei Starovoitov wrote:
> > On Tue, Aug 29, 2017 at 07:03:43PM -0600, David Ahern wrote:
> >> On 8/28/17 10:11 PM, Alexei Starovoitov wrote:
> >>>
> >>> Agree on the above, but you're mixing semantics of the new recurse
> >>> flag and implementation of it. Ex: we don't have to copy this flag
> >>> from prog->attr into cgroup. So this reset or non-reset discussion
> >>> only makes sense in the context of your current implementation.
> >>> We can implement the logic differently. Like don't copy that flag
> >>> at all and at attach time walk parent->parent->parent and see
> >>> what programs are attached. All of them should have prog->attr & recurse_bit set
> >>> In such implementation detach from 'b' is a nop from reset/non-reset
> >>> point of view. When socket creation in 'c' is invoked the program
> >>> 'c' is called first then the code keeps walking parents until root
> >>> invoking 'a' along the way.
> >>
> >> So you are suggesting there is no recursive flag per cgroup? How do you
> >> know you need to walk cgroups? How do you know when to stop running
> >> programs?
> >
> > you're talking about implementation, right?
> > My 'proposed' implemenation of walking from cgroup all the way to the root
> > is just an example. It's not efficient. More below...
> >
> >>> I'm not saying it will be an efficient implementation. The point
> >>> is to discuss UAPI independent of implementation.
> >>>
> >>>> ###
> >>>>
> >>>> Also, let's agree on this intention. Based on the new ground rule, I
> >>>> want to point out this example:
> >>>>
> >>>> If 'a' gets a program installed with no recurse flag set, ONLY processes
> >>>> in 'a' have the 'a' program run. Processes in groups 'b', 'c' and 'd'
> >>>> all stop at cgroup 'b' program.
> >>>
> >>> I'm proposing that such situation should not be allowed to happen.
> >>> In a->b->c->d cgroup scenario if override+recurse prog attached to 'b'
> >>> then only the same override+recurse can be attached to c, d, a.
> >>> So at detach time there can be gaps (like only 'b' and 'd' have
> >>> override+recurse progs), but walking up until root from any point
> >>> will guarantee that only override+recurse programs are seen.
> >>>
> >>
> >> That seems very limiting to me. Seems like you are suggesting the entire
> >> cgroup tree is recursive or non-recursive, but never a mix.
> >
> > Entire cgroup subtree. Yes. It's the simplest uapi I could think of.
>
> So 10 email exchanges later you agree on the UAPI I implemented in this
> patch: user opts in to recursive behavior via a new flag at attach time,
> and once a recursive program is installed at some point in the cgroup
> tree it applies to all descendant cgroups.
>
> So all of these exchanges weren't about the UAPI, but your disagreement
> in my implementation. The only user visible change here is only programs
> marked recursive are run versus going back to the first cgroup marked
> non-recursive.
you cannot be serious. Your implemention is not at all what i'm proposing
above as a simplest uapi. Should we all go back and re-read from the beginning?
> > Easy to understand and argue about and I think it's solving your use case.
> > It's also easily extendable. New combination and features won't break
> > the users. It feels you're in rush to get this stuff for this merge
> > window, therefore I want to agree on something that is simple,
> > non-controversial and extensible.
>
> I am in no-rush, but this does not to fall by the wayside like the net
> namespace specification.
It's more than that! I think you only looking at it from vrf perspective
whereas cgroup-bpf became a corner stone feature and enabler for tcp-bpf
which in turn became a stepping stone for bpf_sk_redirect.
So no, I'm not going to let something half baked that touches
the core idea of cgroup-bpf slide in.
Tejun and Andy also need to take a look, so yes it will take long
time for everyone to agree on this core uapi.
> Given the pending release of 4.13 net-next will close which gives a 2+
> week window to work on v3 before the next merge window. Plenty of time
> for me to work it into the many other things on my plate.
As I proposed several emails ago, please repost patches 2 and 3 that
I already acked and we can continue discussing this patch without
the agitation.
Powered by blists - more mailing lists