netdev - Re: [PATCH net-next 2/6] bpf: add meta pointer for direct access

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <E0D909EE5BB15A4699798539EA149D7F077E53D6@ORSMSX103.amr.corp.intel.com>
Date:   Thu, 28 Sep 2017 05:59:38 +0000
From:   "Waskiewicz Jr, Peter" <peter.waskiewicz.jr@...el.com>
To:     Andy Gospodarek <andy@...yhouse.net>,
        Daniel Borkmann <daniel@...earbox.net>
CC:     "davem@...emloft.net" <davem@...emloft.net>,
        "alexei.starovoitov@...il.com" <alexei.starovoitov@...il.com>,
        "john.fastabend@...il.com" <john.fastabend@...il.com>,
        "jakub.kicinski@...ronome.com" <jakub.kicinski@...ronome.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "mchan@...adcom.com" <mchan@...adcom.com>
Subject: Re: [PATCH net-next 2/6] bpf: add meta pointer for direct access

On 9/26/17 10:21 AM, Andy Gospodarek wrote:
> On Mon, Sep 25, 2017 at 08:50:28PM +0200, Daniel Borkmann wrote:
>> On 09/25/2017 08:10 PM, Andy Gospodarek wrote:
>> [...]
>>> First, thanks for this detailed description.  It was helpful to read
>>> along with the patches.
>>>
>>> My only concern about this area being generic is that you are now in a
>>> state where any bpf program must know about all the bpf programs in the
>>> receive pipeline before it can properly parse what is stored in the
>>> meta-data and add it to an skb (or perform any other action).
>>> Especially if each program adds it's own meta-data along the way.
>>>
>>> Maybe this isn't a big concern based on the number of users of this
>>> today, but it just starts to seem like a concern as there are these
>>> hints being passed between layers that are challenging to track due to a
>>> lack of a standard format for passing data between.
>>
>> Btw, we do have similar kind of programmable scratch buffer also today
>> wrt skb cb[] that you can program from tc side, the perf ring buffer,
>> which doesn't have any fixed layout for the slots, or a per-cpu map
>> where you can transfer data between tail calls for example, then tail
>> calls themselves that need to coordinate, or simply mangling of packets
>> itself if you will, but more below to your use case ...
>>
>>> The main reason I bring this up is that Michael and I had discussed and
>>> designed a way for drivers to communicate between each other that rx
>>> resources could be freed after a tx completion on an XDP_REDIRECT
>>> action.  Much like this code, it involved adding an new element to
>>> struct xdp_md that could point to the important information.  Now that
>>> there is a generic way to handle this, it would seem nice to be able to
>>> leverage it, but I'm not sure how reliable this meta-data area would be
>>> without the ability to mark it in some manner.
>>>
>>> For additional background, the minimum amount of data needed in the case
>>> Michael and I were discussing was really 2 words.  One to serve as a
>>> pointer to an rx_ring structure and one to have a counter to the rx
>>> producer entry.  This data could be acessed by the driver processing the
>>> tx completions and callback to the driver that received the frame off the wire
>>> to perform any needed processing.  (For those curious this would also require a
>>> new callback/netdev op to act on this data stored in the XDP buffer.)
>>
>> What you describe above doesn't seem to be fitting to the use-case of
>> this set, meaning the area here is fully programmable out of the BPF
>> program, the infrastructure you're describing is some sort of means of
>> communication between drivers for the XDP_REDIRECT, and should be
>> outside of the control of the BPF program to mangle.
> 
> OK, I understand that perspective.  I think saying this is really meant
> as a BPF<->BPF communication channel for now is fine.
> 
>> You could probably reuse the base infra here and make a part of that
>> inaccessible for the program with some sort of a fixed layout, but I
>> haven't seen your code yet to be able to fully judge. Intention here
>> is to allow for programmability within the BPF prog in a generic way,
>> such that based on the use-case it can be populated in specific ways
>> and propagated to the skb w/o having to define a fixed layout and
>> bloat xdp_buff all the way to an skb while still retaining all the
>> flexibility.
> 
> Some level of reuse might be proper, but I'd rather it be explicit for
> my use since it's not exclusively something that will need to be used by
> a BPF prog, but rather the driver.  I'll produce some patches this week
> for reference.

Sorry for chiming in late, I've been offline.

We're looking to add some functionality from driver to XDP inside this 
xdp_buff->data_meta region.  We want to assign it to an opaque 
structure, that would be specific per driver (think of a flex descriptor 
coming out of the hardware).  We'd like to pass these offloaded 
computations into XDP programs to help accelerate them, such as packet 
type, where headers are located, etc.  It's similar to Jesper's RFC 
patches back in May when passing through the mlx Rx descriptor to XDP.

This is actually what a few of us are planning to present at NetDev 2.2 
in November.  If you're hoping to restrict this headroom in the xdp_buff 
for an exclusive use case with XDP_REDIRECT, then I'd like to discuss 
that further.

-PJ