[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <53e9dd2f-f40a-b43b-99c9-62f5ce3a665c@fb.com>
Date: Thu, 27 Apr 2017 16:31:14 -0700
From: Alexei Starovoitov <ast@...com>
To: Jesper Dangaard Brouer <brouer@...hat.com>,
Andy Gospodarek <andy@...yhouse.net>
CC: John Fastabend <john.fastabend@...il.com>,
Alexei Starovoitov <alexei.starovoitov@...il.com>,
Daniel Borkmann <daniel@...earbox.net>,
Daniel Borkmann <borkmann@...earbox.net>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"xdp-newbies@...r.kernel.org" <xdp-newbies@...r.kernel.org>
Subject: Re: xdp_redirect ifindex vs port. Was: best API for returning/setting
egress port?
On 4/27/17 1:41 AM, Jesper Dangaard Brouer wrote:
> When registering/attaching a XDP/bpf program, we would just send the
> file-descriptor for this port-map along (like we do with the bpf_prog
> FD). Plus, it own ingress-port number this program is in the port-map.
>
> It is not clear to me, in-which-data-structure on the kernel-side we
> store this reference to the port-map and ingress-port. As today we only
> have the "raw" struct bpf_prog pointer. I see several options:
>
> 1. Create a new xdp_prog struct that contains existing bpf_prog,
> a port-map pointer and ingress-port. (IMHO easiest solution)
>
> 2. Just create a new pointer to port-map and store it in driver rx-ring
> struct (like existing bpf_prog), but this create a race-challenge
> replacing (cmpxchg) the program (or perhaps it's not a problem as it
> runs under rcu and RTNL-lock).
>
> 3. Extend bpf_prog to store this port-map and ingress-port, and have a
> fast-way to access it. I assume it will be accessible via
> bpf_prog->bpf_prog_aux->used_maps[X] but it will be too slow for XDP.
I'm not sure I completely follow the 3 proposals.
Are you suggesting to have only one netdev_array per program?
Why not to allow any number like we do for tailcall+prog_array, etc?
We can teach verifier to allow new helper
bpf_tx_port(netdev_array, port_num);
to only be used with netdev_array map type.
It will fetch netdevice pointer from netdev_array[port_num]
and will tx the packet into it.
We can make it similar to bpf_tail_call(), so that program will
finish on successful bpf_tx_port() or
make it into 'delayed' tx which will be executed when program finishes.
Not sure which approach is better.
We can also extend this netdev_array into broadcast/multicast. Like
bpf_tx_allports(&netdev_array);
call from the program will xmit the packet to all netdevices
in that 'netdev_array' map type.
The map-in-map support can be trivially extended to allow netdev_array,
then the program can create N multicast groups of netdevices.
Each multicast group == one netdev_array map.
The user space will populate a hashmap with these netdev_arrays and
bpf kernel side can select dynamically which multicast group to use
to send the packets to.
bpf kernel side may look like:
struct bpf_netdev_array *netdev_array = bpf_map_lookup_elem(&hash, key);
if (!netdev_array)
...
if (my_condition)
bpf_tx_allports(netdev_array); /* broadcast to all netdevices */
else
bpf_tx_port(netdev_array, port_num); /* tx into one netdevice */
that's an artificial example. Just trying to point out
that we shouldn't restrict the feature too soon.
Powered by blists - more mailing lists