netdev - Re: [PATCHv4 bpf-next 0/2] xdp: add dev map multicast support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Wed, 27 May 2020 12:21:54 +0200
From:   Toke Høiland-Jørgensen <toke@...hat.com>
To:     Hangbin Liu <liuhangbin@...il.com>, bpf@...r.kernel.org
Cc:     netdev@...r.kernel.org, Jiri Benc <jbenc@...hat.com>,
        Jesper Dangaard Brouer <brouer@...hat.com>,
        Eelco Chaudron <echaudro@...hat.com>, ast@...nel.org,
        Daniel Borkmann <daniel@...earbox.net>,
        Lorenzo Bianconi <lorenzo.bianconi@...hat.com>,
        Hangbin Liu <liuhangbin@...il.com>
Subject: Re: [PATCHv4 bpf-next 0/2] xdp: add dev map multicast support

Hangbin Liu <liuhangbin@...il.com> writes:

> Hi all,
>
> This patchset is for xdp multicast support, which has been discussed
> before[0]. The goal is to be able to implement an OVS-like data plane in
> XDP, i.e., a software switch that can forward XDP frames to multiple
> ports.
>
> To achieve this, an application needs to specify a group of interfaces
> to forward a packet to. It is also common to want to exclude one or more
> physical interfaces from the forwarding operation - e.g., to forward a
> packet to all interfaces in the multicast group except the interface it
> arrived on. While this could be done simply by adding more groups, this
> quickly leads to a combinatorial explosion in the number of groups an
> application has to maintain.
>
> To avoid the combinatorial explosion, we propose to include the ability
> to specify an "exclude group" as part of the forwarding operation. This
> needs to be a group (instead of just a single port index), because a
> physical interface can be part of a logical grouping, such as a bond
> device.
>
> Thus, the logical forwarding operation becomes a "set difference"
> operation, i.e. "forward to all ports in group A that are not also in
> group B". This series implements such an operation using device maps to
> represent the groups. This means that the XDP program specifies two
> device maps, one containing the list of netdevs to redirect to, and the
> other containing the exclude list.
>
> To achieve this, I re-implement a new helper bpf_redirect_map_multi()
> to accept two maps, the forwarding map and exclude map. If user
> don't want to use exclude map and just want simply stop redirecting back
> to ingress device, they can use flag BPF_F_EXCLUDE_INGRESS.
>
> The example in patch 2 is functional, but not a lot of effort
> has been made on performance optimisation. I did a simple test(pkt size 64)
> with pktgen. Here is the test result with BPF_MAP_TYPE_DEVMAP_HASH
> arrays:
>
> bpf_redirect_map() with 1 ingress, 1 egress:
> generic path: ~1600k pps
> native path: ~980k pps
>
> bpf_redirect_map_multi() with 1 ingress, 3 egress:
> generic path: ~600k pps
> native path: ~480k pps
>
> bpf_redirect_map_multi() with 1 ingress, 9 egress:
> generic path: ~125k pps
> native path: ~100k pps
>
> The bpf_redirect_map_multi() is slower than bpf_redirect_map() as we loop
> the arrays and do clone skb/xdpf. The native path is slower than generic
> path as we send skbs by pktgen. So the result looks reasonable.

How are you running these tests? Still on virtual devices? We really
need results from a physical setup in native mode to assess the impact
on the native-XDP fast path. The numbers above don't tell much in this
regard. I'd also like to see a before/after patch for straight
bpf_redirect_map(), since you're messing with the fast path, and we want
to make sure it's not causing a performance regression for regular
redirect.

Finally, since the overhead seems to be quite substantial: A comparison
with a regular network stack bridge might make sense? After all we also
want to make sure it's a performance win over that :)

-Toke