[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c9b3ff9b-8938-3274-da29-c04f9da2a314@fb.com>
Date: Wed, 26 Apr 2017 10:58:45 -0700
From: Alexei Starovoitov <ast@...com>
To: John Fastabend <john.fastabend@...il.com>,
Jesper Dangaard Brouer <brouer@...hat.com>
CC: Alexei Starovoitov <alexei.starovoitov@...il.com>,
Daniel Borkmann <daniel@...earbox.net>,
Andy Gospodarek <andy@...yhouse.net>,
Daniel Borkmann <borkmann@...earbox.net>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"xdp-newbies@...r.kernel.org" <xdp-newbies@...r.kernel.org>
Subject: Re: xdp_redirect ifindex vs port. Was: best API for returning/setting
egress port?
On 4/26/17 9:35 AM, John Fastabend wrote:
>
>> As Alexei also mentioned before, ifindex vs port makes no real
>> difference seen from the bpf program side. It is userspace's
>> responsibility to add ifindex/port's to the bpf-maps, according to how
>> the bpf program "policy" want to "connect" these ports. The
>> port-table system add one extra step, of also adding this port to the
>> port-table (which lives inside the kernel).
>>
>
> I'm not sure I understand the "lives inside the kernel" bit. I assumed
> the 'map' should be a bpf map and behave like any other bpf map.
>
> I wanted a new map to be defined, something like this from the bpf programmer
> side.
>
> struct bpf_map_def SEC("maps") port_table =
> .type = BPF_MAP_TYPE_PORT_CONNECTION,
> .key_size = sizeof(u32),
> .value_size = BPF_PORT_CONNECTION_SIZE,
> .max_entries = 256,
> };
I like the idea.
We have prog_array, perf_event_array, cgroup_array map specializations.
This one can be new netdev_array with some new bpf_redirect-like helper
accessing it.
>> When loading the XDP program, we also need to pass along a port table
>> "id" this XDP program is associated with (and if it doesn't exists you
>> create it). And your userspace "control-plane" application also need
>> to know this port table "id", when adding a new port.
>
> So the user space application that is loading the program also needs
> to handle this map. This seems correct to me. But I don't see the
> value in making some new port table when we already have well understood
> framework for maps.
+1
>>
>> The concept of having multiple port tables is key. As this implies we
>> can have several simultaneous "data-planes" that is *isolated* from
>> each-other. Think about how network-namespaces/containers want
>> isolation. A subtle thing I'm afraid to mention, is that oppose to the
>> ifindex model, a port table with mapping to a net_device pointer, would
>> allow (faster) delivery into the container's inner net_device, which
>> sort of violates the isolation, but I would argue it is not a problem
>> as this net_device pointer could only be added from a process within the
>> namespace. I like this feature, but it could easily be disallowed via
>> port insertion-time validation.
>>
>
> I think the above optimization should be allowed. And agree multiple port
> tables (maps?) is needed. Again all this points to using standard maps
> logic in my mind. For permissions and different domains, which I think
> you were starting to touch on, it looks like we could extend the pinning API.
> At the moment it does an inode_permission(inode, MAY_WRITE) check but I
> presume this could be extended. None of this would be needed in v1 and
> could be added subsequently. read-only maps seems doable.
this is great idea. Once BPF_MAP_TYPE_NETDEV_ARRAY is populated
the user space can make it readonly to prevent further changes.
From user space it can be done similar to perf_events/cgroups as well.
bpf_map_update_elem(&netdev_array, &port_num, &ifindex)
should work.
For bpf_map_lookup_elem() from such netdev_array we can return
ifindex back.
The bpf_map_show_fdinfo() can be customized as well to pretty print
ifindexes of netdevs stored in there.
Powered by blists - more mailing lists