netdev - Re: [PATCH RFC net-next 0/2] net:sched: Introduce tc flower2 classifier based on PANDA parser in kernel

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <614d4c2954b0_dc7fd208aa@john-XPS-13-9370.notmuch>
Date:   Thu, 23 Sep 2021 20:55:21 -0700
From:   John Fastabend <john.fastabend@...il.com>
To:     Jamal Hadi Salim <jhs@...atatu.com>,
        John Fastabend <john.fastabend@...il.com>,
        Tom Herbert <tom@...anda.io>
Cc:     Simon Horman <simon.horman@...igine.com>,
        Jiri Pirko <jiri@...nulli.us>,
        Cong Wang <xiyou.wangcong@...il.com>,
        Felipe Magno de Almeida <felipe@...anda.io>,
        Linux Kernel Network Developers <netdev@...r.kernel.org>,
        Boris Sukholitko <boris.sukholitko@...adcom.com>,
        Vadym Kochan <vadym.kochan@...ision.eu>,
        Ilya Lifshits <ilya.lifshits@...adcom.com>,
        Vlad Buslov <vladbu@...dia.com>,
        Ido Schimmel <idosch@...sch.org>, paulb@...dia.com,
        Davide Caratti <dcaratti@...hat.com>,
        Marcelo Ricardo Leitner <marcelo.leitner@...il.com>,
        Amritha Nambiar <amritha.nambiar@...el.com>,
        "Samudrala, Sridhar" <sridhar.samudrala@...el.com>,
        Pedro Tammela <pctammela@...atatu.com>,
        Eric Dumazet <eric.dumazet@...il.com>
Subject: Re: [PATCH RFC net-next 0/2] net:sched: Introduce tc flower2
 classifier based on PANDA parser in kernel

Jamal Hadi Salim wrote:
> Geez, I missed all the fun ;->
> 
> On 2021-09-22 11:25 p.m., John Fastabend wrote:
> > Tom Herbert wrote:
> >> On Wed, Sep 22, 2021, 6:29 PM John Fastabend <john.fastabend@...il.com>
> >> wrote:
> 
> [..]
> 
> >> John,
> >>
> >> Please look at patch log, there are number of problems that have come up
> >> flow dissector over the years. Most of this is related to inherent
> >> inflexibility, limitations, missing support for fairly basic protocols, and
> >> there's a lot of information loss because of the fixed monolithic data
> >> structures. I've said it many times: skb_flow_dissect is the function we
> >> love to hate. Maybe it's arguable, bit I claim it's 2000 lines of spaghetti
> >> code. I don't think there's anyone to blame for that, this was a
> >> consequence of evolving very useful feature that isn't really amenable to
> >> being written in sequence of imperative instructions (if you recall it used
> >> to be even worse with something like 20 goto's scattered about that defied
> >> any semblance of logical program flow :-) ).
> > 
> > OK, but if thats the goal then shouldn't this series target replacing the
> > flow_dissector code directly? I don't see any edits to ./net/core.
> > 
> 
> Agreed, replacement of flow dissector should be a focus. Jiri's
> suggestion of a followup patch which shows how the rest of the consumers
> of flow dissector could be made to use PANDA is a good idea.

I'de almost propose starting with flow_dissector.c first so we see that the
./net/core user for the SW only case looks good. Although I like the idea
of doing it all in BPF directly so could take a crack at that as well. Then
compare them.

> 
> IMO (correct me if i am wrong Tom), flower2 was merely intended to
> illustrate how one would use PANDA i.e there are already two patches
> of which the first one is essentially PANDA...
> IOW,  it is just flower but with flow dissector replaced by PANDA.
> 
> >>
> >> The equivalent code in PANDA is far simpler, extensible, and maintainable
> >> and there are opportunities for context aware optimizations that achieve
> >> higher performance (we'll post performance numbers showing that shortly).
> >> It's also portable to different environments both SW and HW.
> > 
> > If so replace flow_dissector then I think and lets debate that.
> > 
> > My first question as a flow dissector replacement would be the BPF
> > flow dissector was intended to solve the generic parsing problem.
> 
> > Why would Panda be better? My assumption here is that BPF should
> > solve the generic parsing problem, but as we noted isn't very
> > friendly to HW offload. So we jumped immediately into HW offload
> > space. If the problem is tc_flower is not flexible enough
> > couldn't we make tc_flower use the BPF dissector? That should
> > still allow tc flower to do its offload above the sw BPF dissector
> > to hardware just fine.
> > 
> > I guess my first level question is why did BPF flow dissector
> > program not solve the SW generic parsing problem. I read the commit
> > messages and didn't find the answer.
> > 
> 
> Sorry, you cant replace/flowdissector/BPF such that flower can
> consume it;-> You are going to face a huge path explosion with the 
> verifier due to the required branching and then resort to all
> kinds of speacial-cased acrobatics.
> See some samples of XDP code going from trying to parse basic TCP 
> options to resorting to tricking the verifier.
> For shits and giggles, as they say in Eastern Canada, try to do
> IPV6 full parsing with BPF (and handle all the variable length
> fields).

We parse TLVs already and it works just fine. It requires some
careful consideration and clang does some dumb things here and
there, but it is doable. Sure verifier could maybe be improved
around a few cases and C frontend gets in the way sometimes,
but PANDA or P4 or other DSL could rewrite in LLVM-IR directly
to get the correct output.

> Generally:
> BPF is good for specific smaller parsing tasks; the ebpf flow dissector
> hook should be trivial to add to PANDA. And despite PANDA being able
> to generate EBPF - I would still say it depends on the depth of the
> parse tree to be sensible to use eBPF.

Going to disagree. I'm fairly confident we could write a BPF
program to do the flow disection. Anyways we can always improve
the verifier as needed and this helps lots of things not
just this bit. Also flow dissector will be loaded once at early
boot most likely so we can allow it to take a bit longer or
pre-verify it. Just ideas.

> 
> Earlier in the thread you said a couple of things that caught my
> attention:
> 
>  > I don't think P4 or Panda should be in-kernel. The kernel has a BPF
>  > parser that can do arbitrary protocol parsing today. I don't see
>  > a reason to add another thing on the chance a hardware offload
>  > might come around. Anyways P4/Panda can compile to the BPF parser
>  > or flower if they want and do their DSL magic on top. And sure
>  > we might want to improve the clang backends, the existing flower
>  > classifier, and BPF verifier.
>  >
>  >
>  > Vendors have the ability to code up arbitrary hints today. They just
>  > haven't open sourced it or made it widely available. I don't see how
>  > a 'tc' interface would help with this. I suspect most hardware could
>  > prepend hints or put other arbitrary data in the descriptor or elsewhere.
>  > The compelling reason to open source it is missing.
> 
> Please, please _lets not_ encourage vendors to continue
> keep things proprietary!

Fair enough. Some frustration leaking in from my side knowing
the hardware has been around for years and we've seen multiple
proposals but only limited hardware backing. Tom mentioned
he was working on the hardware angle so perhaps its close.

> Statements like "I don't think P4 or Panda should be in-kernel..."
> are just too strong.

Where I wanted to go with this is P4 and Panda are DSLs in my
mind. I think we should keep the kernel non-specific to any
one DSL. We should have a low level generic way to add them
to the kernel, I think this is BPF. Then we let users pick
whatever DSL they like and/or make up their own DSL.

Is the counter-argument that Panda is not a DSL, but rather
a low-level parser builder pattern.

> Instead lets focus on how we can make P4 and other hardware offloads
> work in conjunction with the kernel (instead of totally bypassing
> it which is what vendors are doing enmasse already). There are
> billions of $ invested in these ASICs and lets welcome them into
> our world. It serves and helps grow the Linux community better.
> The efforts of switchdev and tc offloading have proven it is possible.
> Vendors (and i am going to call out Broadcom on the switching side here)
> are not partaking because they see it as an economical advantage not to
> partake.
> 
> We have learnt a lot technically since switchdev/tc offloads happened.
> So it is doable.
> The first rule is: In order to get h/w offload to work lets also have
> digitally equivalent implementation in s/w.

But there is a cost to this its yet another bit of software to
maintain and review and so on. I'm arguing we already have a generic
way to implement h/w equivalence and its BPF. So instead of inventing
another method to do software we can improve the BPF layer. If we need
to build semantics over it that look consumable to hw so be it.

Also the other core question I still don't understand is how a
piece of hardware could consume a parse graph piece-meal through
an interface like proposed in flower2 and generate an arbitrary
parse graph? On the fly none the less. That feels like some very
powerful firmware to me.
And I would prefer open source userspace code (non-kernel) to
deep magic in firmware. At least then I can see it, patch it,
fix it, etc.

Last thing, I'll point out I got back deep into the hardware debate.
I'm still not convinced its the right thing to rip out the flow
dissector piece and replace it with Panda.

Thanks!
John