netdev - Re: Let's do P4

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20161029093905.GA1810@pox.localdomain>
Date:   Sat, 29 Oct 2016 11:39:05 +0200
From:   Thomas Graf <tgraf@...g.ch>
To:     Jiri Pirko <jiri@...nulli.us>
Cc:     netdev@...r.kernel.org, davem@...emloft.net, jhs@...atatu.com,
        roopa@...ulusnetworks.com, john.fastabend@...il.com,
        jakub.kicinski@...ronome.com, simon.horman@...ronome.com,
        ast@...nel.org, daniel@...earbox.net, prem@...efootnetworks.com,
        hannes@...essinduktion.org, jbenc@...hat.com, tom@...bertland.com,
        mattyk@...lanox.com, idosch@...lanox.com, eladr@...lanox.com,
        yotamg@...lanox.com, nogahf@...lanox.com, ogerlitz@...lanox.com,
        linville@...driver.com, andy@...yhouse.net, f.fainelli@...il.com,
        dsa@...ulusnetworks.com, vivien.didelot@...oirfairelinux.com,
        andrew@...n.ch, ivecera@...hat.com
Subject: Re: Let's do P4

On 10/29/16 at 09:53am, Jiri Pirko wrote:
> Hi all.
> 
> The network world is divided into 2 general types of hw:
> 1) network ASICs - network specific silicon, containing things like TCAM
>    These ASICs are suitable to be programmed by P4.
> 2) network processors - basically a general purpose CPUs
>    These processors are suitable to be programmed by eBPF.
> 
> I believe that by now, the most people came to a conclusion that it is
> very difficult to handle both types by either P4 or eBPF. And since
> eBPF is part of the kernel, I would like to introduce P4 into kernel
> as well. Here's a plan:

For reference, last time I remember we discussed this in the BPF
offload context:
http://www.spinics.net/lists/netdev/msg356178.html

> 1) Define P4 intermediate representation
>    I cannot imagine loading P4 program (c-like syntax text file) into
>    kernel as is. That means that as the first step, we need find some
>    intermediate representation. I can imagine someting in a form of AST,
>    call it "p4ast". I don't really know how to do this exactly though,
>    it's just an idea.
> 
>    In the end there would be a userspace precompiler for this:
>    $ makep4ast example.p4 example.ast
> 
> 2) Implement p4ast in-kernel interpreter 
>    A kernel module which takes a p4ast and emulates the pipeline.
>    This can be implemented from scratch. Or, p4ast could be compiled
>    to eBPF. I know there are already couple of p4>eBPF compilers.
>    Not sure how feasible it would be to put this compiler in kernel.

+1 to using eBPF for emulation. Maybe the compiler doesn't need to be
in the kernel and user space can compile and provide the emulated
pipeline in eBPF directly. See next paragraph for an example where
this could be useful.

> 3) Expose the p4ast in-kernel interpreter to userspace
>    As the easiest way I see in to introduce a new TC classifier cls_p4.
> 
>    This can work in a very similar way cls_bpf is:
>    $ tc filter add dev eth0 ingress p4 da ast example.ast
> 
>    The TC cls_p4 will be also used for runtime table manipulation.

I think this is a great model for the case where HW can provide all
of the required capabilities. Thinking about the case where HW
provides a subset and SW provides an extended version, i.e. the
reality we live in for hosts with ASIC NICs ;-) The hand off point
requires some understanding between p4ast and eBPF.

Therefore another idea would be to use cls_bpf directly for this. The
p4ast IR could be stored in a separate ELF section in the same object
file with an existing eBPF program. The p4ast IR will match the
eBPF prog if capabilities of HW and SW match. If HW is limited, the
p4ast IR represents what the HW can do plus how to pass it to SW. The
eBPF prog contains whatever logic is required to take over if the HW
either bailed out or handed over deliberately. Then on top, all the
missing pieces of functionality which can only be performed in SW.

tc then loads 1) eBPF maps and prog through bpf() syscall
              2) cls_bpf filter with p4ast IR plus ref to prog and
                 maps

> 4) Offload p4ast programs into hardware
>    The same p4ast program representation will be passed down
>    to drivers via existing TC offloading way - ndo_setup_tc.
>    Drivers will then parse it and setup the hardware
>    accordingly. Driver will also have possibility to error out
>    in case it does not support some requested feature.