netdev - Re: Let's do P4

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20161030184443.21b8a3d4@laptop>
Date:   Sun, 30 Oct 2016 18:44:43 +0000
From:   Jakub Kicinski <kubakici@...pl>
To:     Jiri Pirko <jiri@...nulli.us>
Cc:     Thomas Graf <tgraf@...g.ch>,
        John Fastabend <john.fastabend@...il.com>,
        netdev@...r.kernel.org, davem@...emloft.net, jhs@...atatu.com,
        roopa@...ulusnetworks.com, simon.horman@...ronome.com,
        ast@...nel.org, daniel@...earbox.net, prem@...efootnetworks.com,
        hannes@...essinduktion.org, jbenc@...hat.com, tom@...bertland.com,
        mattyk@...lanox.com, idosch@...lanox.com, eladr@...lanox.com,
        yotamg@...lanox.com, nogahf@...lanox.com, ogerlitz@...lanox.com,
        linville@...driver.com, andy@...yhouse.net, f.fainelli@...il.com,
        dsa@...ulusnetworks.com, vivien.didelot@...oirfairelinux.com,
        andrew@...n.ch, ivecera@...hat.com,
        Maciej Żenczykowski <zenczykowski@...il.com>
Subject: Re: Let's do P4

On Sun, 30 Oct 2016 19:01:03 +0100, Jiri Pirko wrote:
> Sun, Oct 30, 2016 at 06:45:26PM CET, kubakici@...pl wrote:
> >On Sun, 30 Oct 2016 17:38:36 +0100, Jiri Pirko wrote:  
> >> Sun, Oct 30, 2016 at 11:26:49AM CET, tgraf@...g.ch wrote:  
>  [...]  
>  [...]  
> >>  [...]  
> >>  [...]  
> >>  [...]  
> >>  [...]    
>  [...]  
> >> 
> >> Agreed.  
> >
> >Just to clarify my intention here was not to suggest the use of eBPF as
> >the IR.  I was merely cautioning against bundling the new API with P4,
> >for multiple reasons.  As John mentioned P4 spec was evolving in the
> >past.  The spec is designed for HW more capable than the switch ASICs we
> >have today.  As vendors move to provide more configurability we may need
> >to extend the API beyond P4.  We may want to extend this API to for SW
> >hand-offs (as suggested by Thomas) which are not part of P4 spec.  Also
> >John showed examples of matchd software which already uses P4 at the
> >frontend today and translates it to different targets (eBPF, u32, HW).
> >It may just be about the naming but I feel like calling the new API
> >more generically, switch AST or some such may help to avoid unnecessary
> >ties and confusion.  
> 
> Well, that basically means to create "something" that could be be used
> to translate p4 source to. Not sure how exactly this "something" should
> look like and how different would it be from p4. I thought it might
> be good to benefit from the p4 definition and use it directly. Not sure.

We have to translate the P4 into "something" already, that something
is the AST we will load into the kernel.  Or were you planning to use
some official P4 AST?  I'm not suggesting we add our own high level
language.  I agree that P4 is a good starting point, and perhaps a good
high level language.  I'm just cautious of creating an equivalency
between high level language (P4) and the kernel ABI.

Perhaps I'm just wasting everyone's time with this.

> >> 
> >> Exactly. Following drawing shows p4 pipeline setup for SW and Hw:
> >> 
> >>                                  |
> >>                                  |               +--> ebpf engine
> >>                                  |               |
> >>                                  |               |
> >>                                  |           compilerB
> >>                                  |               ^
> >>                                  |               |
> >> p4src --> compilerA --> p4ast --TCNL--> cls_p4 --+-> driver -> compilerC -> HW
> >>                                  |
> >>                        userspace | kernel
> >>                                  |
> >>
> >> Now please consider runtime API for rule insertion/removal/stats/etc.
> >> Also, the single API is cls_p4 here:
> >> 
> >>                         |
> >>                         |            
> >>                         |            
> >>                         |               
> >>                         |            ebpf map fillup
> >>                         |               ^
> >>                         |               |
> >>              p4 rule --TCNL--> cls_p4 --+-> driver -> HW table fillup
> >>                         |
> >>               userspace | kernel
> >>                           
> >
> >My understanding was that the main purpose of SW eBPF translation would
> >be to piggy back on eBPF userspace map API.  This seems not to be the
> >case here?  Is "P4 rule" being added via some new API?  From performance  
> 
> cls_p4 TC classifier.

Oh, so the cls_p4 is just a proxy forwarding the requests to drivers
or eBPF backend.  Got it.  Sorry for being slow.  And the requests
come down via change() op or something new?  I wonder how such scheme
compares to eBPF maps performance-wise (updates/sec).

> >perspective the SW AST implementation would probably not be any slower
> >than u32, so I don't think we need eBPF for performance.  I must be
> >misreading this, if we want eBPF fallback we must extend eBPF with all
> >the map types anyway... so we could just use eBPF map API?  I believe
> >John has already done some work in this space (see his GitHub :))  
> 
> I don't think you can use existing BPF maps kernel API. You would still
> have to have another API just for the offloaded datapath. And that is
> a bypass. I strongly believe we need a single kernel API for both
> SW and HW datapath setup and runtime configuration.

Agreed, single API is a must.  What is the HW characteristic which
doesn't fit with eBPF map API, though?  For eBPF offload I was planning
on adding offload hooks on eBPF map lookup/update paths and a way of
associating the map with a netdev.  This should be enough to forward
updates to the driver and intercept reads to return the right
statistics.