[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM0EoMmfDoZ9_ZdK-ZjHjFAjuNN8fVK+R57_UaFqAm=wA0AWVA@mail.gmail.com>
Date: Fri, 26 Apr 2024 13:12:14 -0400
From: Jamal Hadi Salim <jhs@...atatu.com>
To: Paolo Abeni <pabeni@...hat.com>
Cc: netdev@...r.kernel.org, deb.chatterjee@...el.com, anjali.singhai@...el.com,
namrata.limaye@...el.com, tom@...anda.io, mleitner@...hat.com,
Mahesh.Shirshyad@....com, tomasz.osinski@...el.com, jiri@...nulli.us,
xiyou.wangcong@...il.com, davem@...emloft.net, edumazet@...gle.com,
kuba@...nel.org, vladbu@...dia.com, horms@...nel.org, khalidm@...dia.com,
toke@...hat.com, victor@...atatu.com, pctammela@...atatu.com,
Vipin.Jain@....com, dan.daly@...el.com, andy.fingerhut@...il.com,
chris.sommers@...sight.com, mattyk@...dia.com, bpf@...r.kernel.org
Subject: Re: [PATCH net-next v16 00/15] Introducing P4TC (series 1)
On Fri, Apr 19, 2024 at 2:01 PM Jamal Hadi Salim <jhs@...atatu.com> wrote:
>
> On Fri, Apr 19, 2024 at 1:20 PM Paolo Abeni <pabeni@...hat.com> wrote:
> >
> > On Fri, 2024-04-19 at 08:08 -0400, Jamal Hadi Salim wrote:
> > > On Thu, Apr 11, 2024 at 12:24 PM Jamal Hadi Salim <jhs@...atatu.com> wrote:
> > > >
> > > > On Thu, Apr 11, 2024 at 10:07 AM Paolo Abeni <pabeni@...hat.com> wrote:
> > > > >
> > > > > On Wed, 2024-04-10 at 10:01 -0400, Jamal Hadi Salim wrote:
> > > > > > The only change that v16 makes is to add a nack to patch 14 on kfuncs
> > > > > > from Daniel and John. We strongly disagree with the nack; unfortunately I
> > > > > > have to rehash whats already in the cover letter and has been discussed over
> > > > > > and over and over again:
> > > > >
> > > > > I feel bad asking, but I have to, since all options I have here are
> > > > > IMHO quite sub-optimal.
> > > > >
> > > > > How bad would be dropping patch 14 and reworking the rest with
> > > > > alternative s/w datapath? (I guess restoring it from oldest revision of
> > > > > this series).
> > > >
> > > >
> > > > We want to keep using ebpf for the s/w datapath if that is not clear by now.
> > > > I do not understand the obstructionism tbh. Are users allowed to use
> > > > kfuncs as part of infra or not? My understanding is yes.
> > > > This community is getting too political and my worry is that we have
> > > > corporatism creeping in like it is in standards bodies.
> > > > We started by not using ebpf. The same people who are objecting now
> > > > went up in arms and insisted we use ebpf. As a member of this
> > > > community, my motivation was to meet them in the middle by
> > > > compromising. We invested another year to move to that middle ground.
> > > > Now they are insisting we do not use ebpf because they dont like our
> > > > design or how we are using ebpf or maybe it's not a use case they have
> > > > any need for or some other politics. I lost track of the moving goal
> > > > posts. Open source is about solving your itch. This code is entirely
> > > > on TC, zero code changed in ebpf core. The new goalpost is based on
> > > > emotional outrage over use of functions. The whole thing is getting
> > > > extremely toxic.
> > > >
> > >
> > > Paolo,
> > > Following up since no movement for a week now;->
> > > I am going to give benefit of doubt that there was miscommunication or
> > > misunderstanding for all the back and forth that has happened so far
> > > with the nackers. I will provide a summary below on the main points
> > > raised and then provide responses:
> > >
> > > 1) "Use maps"
> > >
> > > It doesnt make sense for our requirement. The reason we are using TC
> > > is because a) P4 has an excellent fit with TC match action paradigm b)
> > > we are targeting both s/w and h/w and the TC model caters well for
> > > this. The objects belong to TC, shared between s/w, h/w and control
> > > plane (and netlink is the API). Maybe this diagram would help:
> > > https://github.com/p4tc-dev/docs/blob/main/images/why-p4tc/p4tc-runtime-pipeline.png
> > >
> > > While the s/w part stands on its own accord (as elaborated many
> > > times), for TC which has offloads, the s/w twin is introduced before
> > > the h/w equivalent. This is what this series is doing.
> > >
> > > 2) "but ... it is not performant"
> > > This has been brought up in regards to netlink and kfuncs. Performance
> > > is a lower priority to P4 correctness and expressibility.
> > > Netlink provides us the abstractions we need, it works with TC for
> > > both s/w and h/w offload and has a lot of knowledge base for
> > > expressing control plane APIs. We dont believe reinventing all that
> > > makes sense.
> > > Kfuncs are a means to an end - they provide us the gluing we need to
> > > have an ebpf s/w datapath to the TC objects. Getting an extra
> > > 10-100Kpps is not a driving factor.
> > >
> > > 3) "but you did it wrong, here's how you do it..."
> > >
> > > I gave up on responding to this - but do note this sentiment is a big
> > > theme in the exchanges and consumed most of the electrons. We are
> > > _never_ going to get any consensus with statements like "tc actions
> > > are a mistake" or "use tcx".
> > >
> > > 4) "... drop the kfunc patch"
> > >
> > > kfuncs essentially boil down to function calls. They don't require any
> > > special handling by the eBPF verifier nor introduce new semantics to
> > > eBPF. They are similar in nature to the already existing kfuncs
> > > interacting with other kernel objects such as nf_conntrack.
> > > The precedence (repeated in conferences and email threads multiple
> > > times) is: kfuncs dont have to be sent to ebpf list or reviewed by
> > > folks in the ebpf world. And We believe that rule applies to us as
> > > well. Either kfuncs (and frankly ebpf) is infrastructure glue or it's
> > > not.
> > >
> > > Now for a little rant:
> > >
> > > Open source is not a zero-sum game. Ebpf already coexists with
> > > netfilter, tc, etc and various subsystems happily.
> > > I hope our requirement is clear and i dont have to keep justifying why
> > > P4 or relitigate over and over again why we need TC. Open source is
> > > about scratching your itch and our itch is totally contained within
> > > TC. I cant help but feel that this community is getting way too
> > > pervasive with politics and obscure agendas. I understand agendas, I
> > > just dont understand the zero-sum thinking.
> > > My view is this series should still be applied with the nacks since it
> > > sits entirely on its own silo within networking/TC (and has nothing to
> > > do with ebpf).
> >
> > It's really hard for me - meaning I'll not do that - applying a series
> > that has been so fiercely nacked, especially given that the other
> > maintainers are not supporting it.
> >
> > I really understand this is very bad for you.
> >
> > Let me try to do an extreme attempt to find some middle ground between
> > this series and the bpf folks.
> >
> > My understanding is that the most disliked item is the lifecycle for
> > the objects allocated via the kfunc(s).
> >
> > If I understand correctly, the hard requirement on bpf side is that any
> > kernel object allocated by kfunc must be released at program unload
> > time. p4tc postpone such allocation to recycle the structure.
> >
> > While there are other arguments, my reading of the past few iterations
> > is that solving the above node should lift the nack, am I correct?
> >
> > Could p4tc pre-allocate all the p4tc_table_entry_act_bpf_kern entries
> > and let p4a_runt_create_bpf() fail if the pool is empty? would that
> > satisfy the bpf requirement?
>
> Let me think about it and weigh the consequences.
>
Sorry, was busy evaluating. Yes, we can enforce the memory allocation
constraints such that when the ebpf program is removed any entries
added by said ebpf program can be removed from the datapath.
> > Otherwise could p4tc force free the p4tc_table_entry_act_bpf_kern at
> > unload time?
>
> This one wont work for us unfortunately. If we have entries added by
> the control plane with skip_sw just because the ebpf program is gone
> doesnt mean they disappear.
Just to clarify (the figure
https://github.com/p4tc-dev/docs/blob/main/images/why-p4tc/p4tc-runtime-pipeline.png
should help) :
For P4 table objects, there are 3 types of entries: 1) created by
control path for s/w datapath with skip_hw 2) created by control path
for h/w datapath with skip_sw and 3) dynamically created by s/w
datapath (ebpf) not far off from conntrack.
The only ones we can remove when the ebpf program goes away are from #3.
cheers,
jamal
Powered by blists - more mailing lists