netdev - Re: [PATCHv11 bpf-next 2/5] xdp: add a new helper for dev map multicast support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3b498f73-0a0c-bd25-d1a1-b1098ed403e2@gmail.com>
Date:   Tue, 15 Sep 2020 10:12:09 -0600
From:   David Ahern <dsahern@...il.com>
To:     Jesper Dangaard Brouer <brouer@...hat.com>
Cc:     Toke Høiland-Jørgensen <toke@...hat.com>,
        Alexei Starovoitov <alexei.starovoitov@...il.com>,
        Hangbin Liu <liuhangbin@...il.com>, bpf <bpf@...r.kernel.org>,
        Network Development <netdev@...r.kernel.org>,
        Jiri Benc <jbenc@...hat.com>,
        Eelco Chaudron <echaudro@...hat.com>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Lorenzo Bianconi <lorenzo.bianconi@...hat.com>,
        Andrii Nakryiko <andrii.nakryiko@...il.com>
Subject: Re: [PATCHv11 bpf-next 2/5] xdp: add a new helper for dev map
 multicast support

On 9/11/20 1:58 AM, Jesper Dangaard Brouer wrote:
> On Thu, 10 Sep 2020 12:35:33 -0600
> David Ahern <dsahern@...il.com> wrote:
> 
>> On 9/10/20 11:50 AM, Jesper Dangaard Brouer wrote:
>>> Maybe we should change the devmap-prog approach, and run this on the
>>> xdp_frame's (in bq_xmit_all() to be precise) .  Hangbin's patchset
>>> clearly shows that we need this "layer" between running the xdp_prog and
>>> the devmap-prog.   
>>
>> I would prefer to leave it in dev_map_enqueue.
>>
>> The main premise at the moment is that the program attached to the
>> DEVMAP entry is an ACL specific to that dev. If the program is going to
>> drop the packet, then no sense queueing it.
>>
>> I also expect a follow on feature will be useful to allow the DEVMAP
>> program to do another REDIRECT (e.g., potentially after modifying). It
>> is not handled at the moment as it needs thought - e.g., limiting the
>> number of iterative redirects. If such a feature does happen, then no
>> sense queueing it to the current device.
> 
> It makes a lot of sense to do queuing before redirecting again.  The
> (hidden) bulking we do at XDP redirect is the primary reason for the
> performance boost. We all remember performance difference between
> non-map version of redirect (which Toke fixed via always having the
> bulking available in net_device->xdp_bulkq).
> 
> In a simple micro-benchmark I bet it will look better running the
> devmap-prog right after the xdp_prog (which is what we have today). But
> I claim this is the wrong approach, as soon as (1) traffic is more
> intermixed, and (2) devmap-prog gets bigger and becomes more specific
> to the egress-device (e.g. BPF update constants per egress-device).
> When this happens performance suffers, as I-cache and data-access to
> each egress-device gets pushed out of cache. (Hint VPP/fd.io approach)
> 
> Queuing xdp_frames up for your devmap-prog makes sense, as these share
> common properties.  With intermix traffic the first xdp_prog will sort
> packets into egress-devices, and then the devmap-prog can operate on
> these.  The best illustration[1] of this sorting I saw in a Netflix
> blogpost[2] about FreeBSD, section "RSS Assisted LRO" (not directly
> related, but illustration was good).
> 
> 
> [1] https://miro.medium.com/max/700/1%2alTGL1_D6hTMEMa7EDV8yZA.png
> [2] https://netflixtechblog.com/serving-100-gbps-from-an-open-connect-appliance-cdb51dda3b99
> 

I understand the theory and testing will need to bear that out. There is
a bit of distance (code wise) between where the program is run now and
where you want to put it - the conversion from xdp_buff
to xdp_frame, the enqueue, and what it means to do a redirect to another
device in bq_xmit_all.

More importantly though for a redirect is the current xdp_ok_fwd_dev
check in __xdp_enqueue which for a redirect could be doing the wrong
checks for the wrong device.