netdev - Re: Let's do P4

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20161030195651.GA21149@nanopsycho.orion>
Date:   Sun, 30 Oct 2016 20:56:51 +0100
From:   Jiri Pirko <jiri@...nulli.us>
To:     Jakub Kicinski <kubakici@...pl>
Cc:     Thomas Graf <tgraf@...g.ch>,
        John Fastabend <john.fastabend@...il.com>,
        netdev@...r.kernel.org, davem@...emloft.net, jhs@...atatu.com,
        roopa@...ulusnetworks.com, simon.horman@...ronome.com,
        ast@...nel.org, daniel@...earbox.net, prem@...efootnetworks.com,
        hannes@...essinduktion.org, jbenc@...hat.com, tom@...bertland.com,
        mattyk@...lanox.com, idosch@...lanox.com, eladr@...lanox.com,
        yotamg@...lanox.com, nogahf@...lanox.com, ogerlitz@...lanox.com,
        linville@...driver.com, andy@...yhouse.net, f.fainelli@...il.com,
        dsa@...ulusnetworks.com, vivien.didelot@...oirfairelinux.com,
        andrew@...n.ch, ivecera@...hat.com,
        Maciej Żenczykowski <zenczykowski@...il.com>
Subject: Re: Let's do P4

Sun, Oct 30, 2016 at 07:44:43PM CET, kubakici@...pl wrote:
>On Sun, 30 Oct 2016 19:01:03 +0100, Jiri Pirko wrote:
>> Sun, Oct 30, 2016 at 06:45:26PM CET, kubakici@...pl wrote:
>> >On Sun, 30 Oct 2016 17:38:36 +0100, Jiri Pirko wrote:  
>> >> Sun, Oct 30, 2016 at 11:26:49AM CET, tgraf@...g.ch wrote:  
>>  [...]  
>>  [...]  
>> >>  [...]  
>> >>  [...]  
>> >>  [...]  
>> >>  [...]    
>>  [...]  
>> >> 
>> >> Agreed.  
>> >
>> >Just to clarify my intention here was not to suggest the use of eBPF as
>> >the IR.  I was merely cautioning against bundling the new API with P4,
>> >for multiple reasons.  As John mentioned P4 spec was evolving in the
>> >past.  The spec is designed for HW more capable than the switch ASICs we
>> >have today.  As vendors move to provide more configurability we may need
>> >to extend the API beyond P4.  We may want to extend this API to for SW
>> >hand-offs (as suggested by Thomas) which are not part of P4 spec.  Also
>> >John showed examples of matchd software which already uses P4 at the
>> >frontend today and translates it to different targets (eBPF, u32, HW).
>> >It may just be about the naming but I feel like calling the new API
>> >more generically, switch AST or some such may help to avoid unnecessary
>> >ties and confusion.  
>> 
>> Well, that basically means to create "something" that could be be used
>> to translate p4 source to. Not sure how exactly this "something" should
>> look like and how different would it be from p4. I thought it might
>> be good to benefit from the p4 definition and use it directly. Not sure.
>
>We have to translate the P4 into "something" already, that something
>is the AST we will load into the kernel.  Or were you planning to use
>some official P4 AST?  I'm not suggesting we add our own high level

I'm not aware of existence of some official P4 AST. We have to figure it
out.


>language.  I agree that P4 is a good starting point, and perhaps a good
>high level language.  I'm just cautious of creating an equivalency
>between high level language (P4) and the kernel ABI.

Understood. Definitelly good to be very cautious when defining a kernel
API.


>
>Perhaps I'm just wasting everyone's time with this.
>
>> >> 
>> >> Exactly. Following drawing shows p4 pipeline setup for SW and Hw:
>> >> 
>> >>                                  |
>> >>                                  |               +--> ebpf engine
>> >>                                  |               |
>> >>                                  |               |
>> >>                                  |           compilerB
>> >>                                  |               ^
>> >>                                  |               |
>> >> p4src --> compilerA --> p4ast --TCNL--> cls_p4 --+-> driver -> compilerC -> HW
>> >>                                  |
>> >>                        userspace | kernel
>> >>                                  |
>> >>
>> >> Now please consider runtime API for rule insertion/removal/stats/etc.
>> >> Also, the single API is cls_p4 here:
>> >> 
>> >>                         |
>> >>                         |            
>> >>                         |            
>> >>                         |               
>> >>                         |            ebpf map fillup
>> >>                         |               ^
>> >>                         |               |
>> >>              p4 rule --TCNL--> cls_p4 --+-> driver -> HW table fillup
>> >>                         |
>> >>               userspace | kernel
>> >>                           
>> >
>> >My understanding was that the main purpose of SW eBPF translation would
>> >be to piggy back on eBPF userspace map API.  This seems not to be the
>> >case here?  Is "P4 rule" being added via some new API?  From performance  
>> 
>> cls_p4 TC classifier.
>
>Oh, so the cls_p4 is just a proxy forwarding the requests to drivers
>or eBPF backend.  Got it.  Sorry for being slow.  And the requests
>come down via change() op or something new?  I wonder how such scheme
>compares to eBPF maps performance-wise (updates/sec).

I have no numbers at this time. I guess Jamal and Alexei did some
measurements in this are in the past.


>
>> >perspective the SW AST implementation would probably not be any slower
>> >than u32, so I don't think we need eBPF for performance.  I must be
>> >misreading this, if we want eBPF fallback we must extend eBPF with all
>> >the map types anyway... so we could just use eBPF map API?  I believe
>> >John has already done some work in this space (see his GitHub :))  
>> 
>> I don't think you can use existing BPF maps kernel API. You would still
>> have to have another API just for the offloaded datapath. And that is
>> a bypass. I strongly believe we need a single kernel API for both
>> SW and HW datapath setup and runtime configuration.
>
>Agreed, single API is a must.  What is the HW characteristic which
>doesn't fit with eBPF map API, though?  For eBPF offload I was planning
>on adding offload hooks on eBPF map lookup/update paths and a way of
>associating the map with a netdev.  This should be enough to forward
>updates to the driver and intercept reads to return the right
>statistics.