[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ea4391584d2321f8bf5a0e61ed1e05e5665bbf17.camel@mellanox.com>
Date: Tue, 19 Mar 2019 18:54:08 +0000
From: Saeed Mahameed <saeedm@...lanox.com>
To: "alexei.starovoitov@...il.com" <alexei.starovoitov@...il.com>
CC: "songliubraving@...com" <songliubraving@...com>,
"toke@...hat.com" <toke@...hat.com>,
"daniel@...earbox.net" <daniel@...earbox.net>,
"bpf@...r.kernel.org" <bpf@...r.kernel.org>,
"andrew.gospodarek@...adcom.com" <andrew.gospodarek@...adcom.com>,
"jesper.brouer@...il.com" <jesper.brouer@...il.com>,
"bblanco@...il.com" <bblanco@...il.com>,
"jakub.kicinski@...ronome.com" <jakub.kicinski@...ronome.com>,
"iovisor-dev@...ts.iovisor.org" <iovisor-dev@...ts.iovisor.org>,
"bjorn.topel@...il.com" <bjorn.topel@...il.com>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"kafai@...com" <kafai@...com>
Subject: Re: [RFC][Proposal] BPF Control MAP
On Tue, 2019-03-19 at 08:36 -0700, Alexei Starovoitov wrote:
> On Fri, Mar 08, 2019 at 08:51:03PM +0000, Saeed Mahameed wrote:
> > Thoughts ?
>
> It's certainly an interesting idea. I think we need to agree on use
> cases
> and goals first before bikesheding on the solution.
Sure will discuss most of the use cases tomorrow in the iovisor
meeting.
>
> > Example use cases (XDP only for now):
> >
> > 1) Query XDP stats of all XDP netdevs:
> >
> > xdp_ctrl = bpf_create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type
> > =
> > XDP_STATS);
> >
> > while (bpf_map_get_next_key(xdp_ctrl, &ifindex, &next_ifindex) ==
> > 0) {
> > bpf_map_lookup_elem(xdp_ctrl, &next_ifindex, &stats);
> > // we don't even need to know stats format in this case
> > btf_pretty_print(xdp_ctrl->btf, &stats);
> > ifindex = next_ifindex;
> > }
>
> this bit show cases advantage of BTF nicely.
>
> > 2) Setup XDP tx redirect resources on egress netdev (netdev with no
> > XDP
> > program).
> >
> > xdp_ctrl = bpf_create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type
> > =
> > XDP_ATTR);
> >
> > xdp_attr->command = SETUP_REDIRECT;
> > xdp_attr->rings.num = 12;
> > xdp_attr->rings.size = 128;
> > bpf_map_update_elem(xdp_ctrl, &ifindex, &xdp_attr);
>
> this one starting to become a bit odd, since input arguments are
> split
> between key an value. ifindex is part of the key whereas command,
> rings.num and rings.size are part of the value.
The idea is that we need to keep a map semantics (object->vlaue).
in this case ( map_attr.ctrl_type = XDP_ATTR ) object (key) is if_index
and value is xdp attributes of that if_index, which plays nicely when
you want to iterate through all the objects (XDP netdevs). simply think
of ifindex as a key and not an input value.
different ctrl map types can have different keys and values hence the
need for map_create and passing map_attr.ctrl_type attribute which will
define what the user is trying to access (which control map) and what
will be the key (object) & value (configuration) pair layout.
> > 3) Turn On/Off XDP meta data offloads and retrieve meta data BTF
> > format
> > of specific netdev/hardware:
> >
> > xdp_ctrl = bpf_create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type
> > =
> > XDP_ATTR);
> >
> > xdp_attr->command = SETUP_MD;
> > xdp_attr->enable_md = 1;
> > err = bpf_map_update_elem(xdp_ctrl, &ifindex, &xdp_attr);
> > if (err) {
> > printf("XDP meta data is not supported on this netdev");
> > return;
> > }
> > // Query Meta data BTF
> > bpf_map_lookup_elem(xdp_ctrl, &ifindex, &xdp_attr);
> > md_btf = xdp_attr.md_btf;
>
> here it gets even weirder, since lookup arguments are also
> split between key and value.
> ifindex is inside the key while addition info is passed
> inside xdp_attr which is part of value.
>
> I wish we could do maps where every key would have different layout.
> Then such api would be suitable.
> The existing maps require all keys and values to be inform.
> I guess one can argue that such 'control map' can have one element
> and one value, but then what is the purpose of 'map_create'
> and 'map_delete==close(fd)' operations?
>
This is exactly the purpose of map_{create/delete} to define what the
key,vlaue format will be ( it is not going to be ifindex for all
control maps), to make it clear, the key doesn't have to be an ifindex
at all, it depends on the map_attr.ctrl_type which the user request on
map creation, so different layouts are already supported by this
proposal
examples of different map_attr.ctrl_type:
fd = create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type = XDP_ATTR)
// Key layout == ifindex, vlaue format if_xdp_attributes
fd = create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type =
BPF_PROG_STATS)
// Key layout == prog_fd, value format struct bpf_prog_stats
fd = create_map(BPF_MAP_TYPE_CONTROL, map_attr.ctrl_type =
BPF_SOCK_ATTR)
// Key layout == socket_fd, value layout struct bpf_sock_attr
extending this can be done in any linux BPF subsystem, by introducing
new map_attr.ctrl_types and new key value layouts of that control type.
on map creation we will attach the btf format of the (key, value) pair
to the map created for that user.
> I think we need something else here. Using BTF to describe
> output stats is nice, but using BTF to describe input query is
> problematic,
> since user cannot know before hand what kernel can and cannot accept.
> imo input should be stable uapi with fixed constants whereas
> stats-like output can be BTF based and vary from driver to driver
> and from one nic version to another.
>
Well, we can decide to use static stable uapi, and still use this
special map to leverage the map API as described in this doc.
but also we can allow dynamic value layouts, but any extension should
be done to the end of any value structuer and on lookup we will only
copy the size that the user already recognizes .. ? or we can
assume/force the user to use the map_attributes to figure out format
layouts and sizes, but still we will guarantee backward compatibility
in kernel level by keeping old format the same, and extension is only
allowed to the end of the value structures.
Powered by blists - more mailing lists