[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <28D58684-578C-4DDF-B18D-70280B923590@redhat.com>
Date: Wed, 27 May 2020 12:32:47 +0200
From: "Eelco Chaudron" <echaudro@...hat.com>
To: "Toke Høiland-Jørgensen" <toke@...hat.com>
Cc: "Hangbin Liu" <liuhangbin@...il.com>, bpf@...r.kernel.org,
netdev@...r.kernel.org, "Jiri Benc" <jbenc@...hat.com>,
"Jesper Dangaard Brouer" <brouer@...hat.com>, ast@...nel.org,
"Daniel Borkmann" <daniel@...earbox.net>,
"Lorenzo Bianconi" <lorenzo.bianconi@...hat.com>
Subject: Re: [PATCHv4 bpf-next 0/2] xdp: add dev map multicast support
On 27 May 2020, at 12:21, Toke Høiland-Jørgensen wrote:
> Hangbin Liu <liuhangbin@...il.com> writes:
>
>> Hi all,
>>
>> This patchset is for xdp multicast support, which has been discussed
>> before[0]. The goal is to be able to implement an OVS-like data plane
>> in
>> XDP, i.e., a software switch that can forward XDP frames to multiple
>> ports.
>>
>> To achieve this, an application needs to specify a group of
>> interfaces
>> to forward a packet to. It is also common to want to exclude one or
>> more
>> physical interfaces from the forwarding operation - e.g., to forward
>> a
>> packet to all interfaces in the multicast group except the interface
>> it
>> arrived on. While this could be done simply by adding more groups,
>> this
>> quickly leads to a combinatorial explosion in the number of groups an
>> application has to maintain.
>>
>> To avoid the combinatorial explosion, we propose to include the
>> ability
>> to specify an "exclude group" as part of the forwarding operation.
>> This
>> needs to be a group (instead of just a single port index), because a
>> physical interface can be part of a logical grouping, such as a bond
>> device.
>>
>> Thus, the logical forwarding operation becomes a "set difference"
>> operation, i.e. "forward to all ports in group A that are not also in
>> group B". This series implements such an operation using device maps
>> to
>> represent the groups. This means that the XDP program specifies two
>> device maps, one containing the list of netdevs to redirect to, and
>> the
>> other containing the exclude list.
>>
>> To achieve this, I re-implement a new helper bpf_redirect_map_multi()
>> to accept two maps, the forwarding map and exclude map. If user
>> don't want to use exclude map and just want simply stop redirecting
>> back
>> to ingress device, they can use flag BPF_F_EXCLUDE_INGRESS.
>>
>> The example in patch 2 is functional, but not a lot of effort
>> has been made on performance optimisation. I did a simple test(pkt
>> size 64)
>> with pktgen. Here is the test result with BPF_MAP_TYPE_DEVMAP_HASH
>> arrays:
>>
>> bpf_redirect_map() with 1 ingress, 1 egress:
>> generic path: ~1600k pps
>> native path: ~980k pps
>>
>> bpf_redirect_map_multi() with 1 ingress, 3 egress:
>> generic path: ~600k pps
>> native path: ~480k pps
>>
>> bpf_redirect_map_multi() with 1 ingress, 9 egress:
>> generic path: ~125k pps
>> native path: ~100k pps
>>
>> The bpf_redirect_map_multi() is slower than bpf_redirect_map() as we
>> loop
>> the arrays and do clone skb/xdpf. The native path is slower than
>> generic
>> path as we send skbs by pktgen. So the result looks reasonable.
>
> How are you running these tests? Still on virtual devices? We really
> need results from a physical setup in native mode to assess the impact
> on the native-XDP fast path. The numbers above don't tell much in this
> regard. I'd also like to see a before/after patch for straight
> bpf_redirect_map(), since you're messing with the fast path, and we
> want
> to make sure it's not causing a performance regression for regular
> redirect.
>
> Finally, since the overhead seems to be quite substantial: A
> comparison
> with a regular network stack bridge might make sense? After all we
> also
> want to make sure it's a performance win over that :)
What about adding a test with only one egress port? So it compares
better to bpf_redirect_map(), i.e. “bpf_redirect_map_multi() with 1
ingress, 1 egress”.
Powered by blists - more mailing lists