[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <40cf6893-4702-4773-1aaa-7dfdc51c6212@fb.com>
Date: Thu, 27 Apr 2017 18:11:02 -0700
From: Alexei Starovoitov <ast@...com>
To: Hannes Frederic Sowa <hannes@...essinduktion.org>,
Martin KaFai Lau <kafai@...com>, <netdev@...r.kernel.org>
CC: Daniel Borkmann <daniel@...earbox.net>, <kernel-team@...com>,
"David S. Miller" <davem@...emloft.net>,
Jesper Dangaard Brouer <brouer@...hat.com>,
John Fastabend <john.fastabend@...il.com>,
Thomas Graf <tgraf@...g.ch>
Subject: prog ID and next steps. Was: [RFC net-next 0/2] Introduce bpf_prog ID
and iteration
On 4/27/17 6:36 AM, Hannes Frederic Sowa wrote:
> On 27.04.2017 08:24, Martin KaFai Lau wrote:
>> This patchset introduces the bpf_prog ID and a new bpf cmd to
>> iterate all bpf_prog in the system.
>>
>> It is still incomplete. The idea can be extended to bpf_map.
>>
>> Martin KaFai Lau (2):
>> bpf: Introduce bpf_prog ID
>> bpf: Test for bpf_prog ID and BPF_PROG_GET_NEXT_ID
>
> Thanks Martin, I like the approach.
>
> I think the progid is also much more suitable to be used in kallsyms
> because it handles collisions correctly and let's correctly walk the
> chain (for example imaging loading two identical programs but install
> them at different hooks, kallsysms doesn't allow to find out which
> program is installed where).
i disagree re: kallsyms. The goal of prog_tag is to let program writers
understand which program is running in a stable way.
id is assigned dynamically and not suitable for that purpose.
> It would help a lot if you could pass the prog_id back during program
> creation, otherwise it will be kind of difficult to get a hold on which
> program is where. ;)
yes, but not a creation time. bpf_prog_load command will keep returning
an FD and all operations on programs will be allowed with FD only.
Think of this 'ID' as program handle or program pointer.
In other words it's obfuscated kernel 'struct bpf_prog *' given to
user space, so that user space can later convert this ID into FD.
The other patch (not shown) will take ID from user space and will
convert it to FD if prog->aux->user is the same or root.
We tried really hard to keep everything FD based. Unfortunately
netlink is not suitable to pass FDs, so to query TC and XDP
we either have to invent a way to install FD from netlink in recvmsg()
or pass something that can be converted to FD later.
That's what program ID is solving.
This set of patches look trivial with simple use of idr,
but it took us long time to get there.
We tried to use 64-bit ID to avoid wrap around issue, but association
between ID and bpf_prog needs to be kept somewhere. The obvious
answer is rhashtable, but it cannot be iterated easily.
Like we'd need to dump the whole thing through bpf syscall which
is not practical.
Then we tried to use 32-bit idr's id + 32-bit timestamp/random.
It works better, but then we hit the issue that bpf_prog_get_next_id
cannot be iterated in a stable way when programs are being deleted
while user space iterates over the whole list.
So at the end we scraped all the fancy things and went with
simple 32-bit ID allocated in _cyclic_ way via idr.
The reason for cyclic is to avoid prog delete/create races,
so ID seen by user space stays stable for 2B ids.
We were concerned that somebody might try to load/delete
a program 2B times to cause the counter to wrap around, but
it turned out not to be an issue. In that sense prog ID is similar
to PID.
So more complete picture of what we're trying to do:
- new bpf_get_fd_from_id syscall cmd will be used to convert
prog ID into prog FD
- tc/xdp/sockets/tracing attachment points will return prog ID
- existing bpf_map_lookup() cmd from prog_array will be returning
prog ID
- bpf_prog_next_id syscall cmd (this patch) is used to iterate
over all prog IDs
- new bpf_prog_get_info syscall cmd (based on prog FD) will be used
to get all or partial info about the program that kernel knows about
Example usage:
- if user space want to see instructions of all loaded programs
it can use a loop like:
while (!bpf_prog_get_next_id(next_id, &next_id)) {
int fd = bpf_prog_get_fd_from_id(next_id);
struct bpf_prog_info info;
bpf_prog_get_info(fd, &info, flags);
// look into info.insns[]
close(fd);
}
- if user space want to see prog_tag of xdp program attached to eth0
// netlink sendmsg() into ifindex of eth0 that returns prog ID
int fd = bpf_prog_get_fd_from_id(id_from_netlink);
struct bpf_prog_info info;
bpf_prog_get_info(fd, &info, flags);
// look into info.prog_tag
close(fd);
the 'flags' argument of bpf_prog_get_info() will be used
to tell kernel which info about the program needs to be dumped.
Otherwise if kernel always dumps everything about the program,
it will make the syscall too slow and too cumbersome.
Possible combinations:
- prog_type, prog_tag, license, prog ID
- array of prog instructions
- array of map IDs
Here we'll introduce similar IDs for maps and
bpf_map_get_info() syscall cmd that will return map_type, map_id, sizes.
If user wants to iterate over all elements of the map, they can
use map_fd = bpf_map_get_fd_from_id(map_id); command
and later use existing bpf_map_get_next_key+bpf_map_lookup_elem.
We believe this way the user space will be able to see _everything_
about bpf programs and maps and can pick and choose whether
it wants to see only programs or only maps or partial info
about progs (without instructions) and so on.
Once we have CTF (debug info) available for maps and progs,
we will extend bpf_prog_get_info() and bpf_map_get_info()
commands to optionally return that as well.
Powered by blists - more mailing lists