netdev - Re: [PATCHv4 bpf-next 0/2] xdp: add dev map multicast support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <28D58684-578C-4DDF-B18D-70280B923590@redhat.com>
Date:   Wed, 27 May 2020 12:32:47 +0200
From:   "Eelco Chaudron" <echaudro@...hat.com>
To:     "Toke Høiland-Jørgensen" <toke@...hat.com>
Cc:     "Hangbin Liu" <liuhangbin@...il.com>, bpf@...r.kernel.org,
        netdev@...r.kernel.org, "Jiri Benc" <jbenc@...hat.com>,
        "Jesper Dangaard Brouer" <brouer@...hat.com>, ast@...nel.org,
        "Daniel Borkmann" <daniel@...earbox.net>,
        "Lorenzo Bianconi" <lorenzo.bianconi@...hat.com>
Subject: Re: [PATCHv4 bpf-next 0/2] xdp: add dev map multicast support



On 27 May 2020, at 12:21, Toke Høiland-Jørgensen wrote:

> Hangbin Liu <liuhangbin@...il.com> writes:
>
>> Hi all,
>>
>> This patchset is for xdp multicast support, which has been discussed
>> before[0]. The goal is to be able to implement an OVS-like data plane 
>> in
>> XDP, i.e., a software switch that can forward XDP frames to multiple
>> ports.
>>
>> To achieve this, an application needs to specify a group of 
>> interfaces
>> to forward a packet to. It is also common to want to exclude one or 
>> more
>> physical interfaces from the forwarding operation - e.g., to forward 
>> a
>> packet to all interfaces in the multicast group except the interface 
>> it
>> arrived on. While this could be done simply by adding more groups, 
>> this
>> quickly leads to a combinatorial explosion in the number of groups an
>> application has to maintain.
>>
>> To avoid the combinatorial explosion, we propose to include the 
>> ability
>> to specify an "exclude group" as part of the forwarding operation. 
>> This
>> needs to be a group (instead of just a single port index), because a
>> physical interface can be part of a logical grouping, such as a bond
>> device.
>>
>> Thus, the logical forwarding operation becomes a "set difference"
>> operation, i.e. "forward to all ports in group A that are not also in
>> group B". This series implements such an operation using device maps 
>> to
>> represent the groups. This means that the XDP program specifies two
>> device maps, one containing the list of netdevs to redirect to, and 
>> the
>> other containing the exclude list.
>>
>> To achieve this, I re-implement a new helper bpf_redirect_map_multi()
>> to accept two maps, the forwarding map and exclude map. If user
>> don't want to use exclude map and just want simply stop redirecting 
>> back
>> to ingress device, they can use flag BPF_F_EXCLUDE_INGRESS.
>>
>> The example in patch 2 is functional, but not a lot of effort
>> has been made on performance optimisation. I did a simple test(pkt 
>> size 64)
>> with pktgen. Here is the test result with BPF_MAP_TYPE_DEVMAP_HASH
>> arrays:
>>
>> bpf_redirect_map() with 1 ingress, 1 egress:
>> generic path: ~1600k pps
>> native path: ~980k pps
>>
>> bpf_redirect_map_multi() with 1 ingress, 3 egress:
>> generic path: ~600k pps
>> native path: ~480k pps
>>
>> bpf_redirect_map_multi() with 1 ingress, 9 egress:
>> generic path: ~125k pps
>> native path: ~100k pps
>>
>> The bpf_redirect_map_multi() is slower than bpf_redirect_map() as we 
>> loop
>> the arrays and do clone skb/xdpf. The native path is slower than 
>> generic
>> path as we send skbs by pktgen. So the result looks reasonable.
>
> How are you running these tests? Still on virtual devices? We really
> need results from a physical setup in native mode to assess the impact
> on the native-XDP fast path. The numbers above don't tell much in this
> regard. I'd also like to see a before/after patch for straight
> bpf_redirect_map(), since you're messing with the fast path, and we 
> want
> to make sure it's not causing a performance regression for regular
> redirect.
>
> Finally, since the overhead seems to be quite substantial: A 
> comparison
> with a regular network stack bridge might make sense? After all we 
> also
> want to make sure it's a performance win over that :)

What about adding a test with only one egress port? So it compares 
better to bpf_redirect_map(), i.e. “bpf_redirect_map_multi() with 1 
ingress, 1 egress”.