[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a1bcd5e8-89dd-0eca-f779-ac345b24661e@gmail.com>
Date: Wed, 9 Sep 2020 21:30:40 -0600
From: David Ahern <dsahern@...il.com>
To: Hangbin Liu <liuhangbin@...il.com>,
Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: bpf@...r.kernel.org, netdev@...r.kernel.org,
Toke Høiland-Jørgensen <toke@...hat.com>,
Jiri Benc <jbenc@...hat.com>,
Jesper Dangaard Brouer <brouer@...hat.com>,
Eelco Chaudron <echaudro@...hat.com>, ast@...nel.org,
Daniel Borkmann <daniel@...earbox.net>,
Lorenzo Bianconi <lorenzo.bianconi@...hat.com>,
Andrii Nakryiko <andrii.nakryiko@...il.com>
Subject: Re: [PATCHv11 bpf-next 2/5] xdp: add a new helper for dev map
multicast support
On 9/9/20 8:35 PM, Hangbin Liu wrote:
> Hi Alexei,
>
> On Wed, Sep 09, 2020 at 02:52:06PM -0700, Alexei Starovoitov wrote:
>> On Mon, Sep 07, 2020 at 04:27:21PM +0800, Hangbin Liu wrote:
>>> This patch is for xdp multicast support. which has been discussed
>>> before[0], The goal is to be able to implement an OVS-like data plane in
>>> XDP, i.e., a software switch that can forward XDP frames to multiple ports.
>>>
>>> To achieve this, an application needs to specify a group of interfaces
>>> to forward a packet to. It is also common to want to exclude one or more
>>> physical interfaces from the forwarding operation - e.g., to forward a
>>> packet to all interfaces in the multicast group except the interface it
>>> arrived on. While this could be done simply by adding more groups, this
>>> quickly leads to a combinatorial explosion in the number of groups an
>>> application has to maintain.
>>>
>>> To avoid the combinatorial explosion, we propose to include the ability
>>> to specify an "exclude group" as part of the forwarding operation. This
>>> needs to be a group (instead of just a single port index), because a
>>> physical interface can be part of a logical grouping, such as a bond
>>> device.
>>>
>>> Thus, the logical forwarding operation becomes a "set difference"
>>> operation, i.e. "forward to all ports in group A that are not also in
>>> group B". This series implements such an operation using device maps to
>>> represent the groups. This means that the XDP program specifies two
>>> device maps, one containing the list of netdevs to redirect to, and the
>>> other containing the exclude list.
>>
>> "set difference" and BPF_F_EXCLUDE_INGRESS makes sense to me as high level api,
>> but I don't see how program or helper is going to modify the packet
>> before multicasting it.
>> Even to implement a basic switch the program would need to modify destination
>> mac addresses before xmiting it on the device.
>> In case of XDP_TX the bpf program is doing it manually.
>> With this api the program is out of the loop.
>> It can prepare a packet for one target netdev, but sending the same
>> packet as-is to other netdevs isn't going to to work correctly.
>
> Yes, we can't modify the packets on ingress as there are multi egress ports
> and each one may has different requirements. So this helper will only forward
> the packets to other group(looks like a multicast group) devices.
>
> I think the packets modification (edit dst mac, add vlan tag, etc) should be
> done on egress, which rely on David's XDP egress support.
agreed. The DEVMAP used for redirect can have programs attached that
update the packet headers - assuming you want to update them.
This is tagged as "multicast" support but it really is redirecting a
packet to multiple devices. One use case I see that evolves from this
set is the ability to both forward packets (e.g., host ingress to VM)
and grab a copy tcpdump style by redirecting packets to a virtual device
(similar to a patch set for dropwatch). ie., no need for an perf-events
style copy to push to userspace.
Powered by blists - more mailing lists