[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMEtUuzkVANM341xPTKH1bNVNuK6TQcyqsdZdkGWausLT5Qj6A@mail.gmail.com>
Date: Wed, 2 Jul 2014 19:29:47 -0700
From: Alexei Starovoitov <ast@...mgrid.com>
To: Andy Lutomirski <luto@...capital.net>
Cc: "David S. Miller" <davem@...emloft.net>,
Ingo Molnar <mingo@...nel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Steven Rostedt <rostedt@...dmis.org>,
Daniel Borkmann <dborkman@...hat.com>,
Chema Gonzalez <chema@...gle.com>,
Eric Dumazet <edumazet@...gle.com>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Arnaldo Carvalho de Melo <acme@...radead.org>,
Jiri Olsa <jolsa@...hat.com>,
Thomas Gleixner <tglx@...utronix.de>,
"H. Peter Anvin" <hpa@...or.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Kees Cook <keescook@...omium.org>,
Linux API <linux-api@...r.kernel.org>,
Network Development <netdev@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH RFC net-next 03/14] bpf: introduce syscall(BPF, ...) and
BPF maps
On Wed, Jul 2, 2014 at 6:43 PM, Andy Lutomirski <luto@...capital.net> wrote:
> On Tue, Jul 1, 2014 at 10:33 PM, Alexei Starovoitov <ast@...mgrid.com> wrote:
>> I want to avoid string names, since they will force new 'strtab', 'symtab'
>> sections in the programs/maps and will uglify the user interface quite a bit.
>
> To be fair, you really need to imitate ELF here. A very simple
> relocation-like table should do the trick.
simple.. right :) I do see the amount of struggle you have with
binutils and vdso.
I really don't want to add relocation unless this is last resort.
Especially since it can be solved without it.
I don't think I explained it enough in my last email… trying again:
>> Back in september one loadable unit was: one eBPF program + set of maps,
>> but tracing requirements forced a change, since multiple programs need
>> to access the same map and maps may need to be pre-populated before
>> the programs start executing, so I've split maps and programs into mostly
>> independent entities, but programs still need to think of maps as local:
>> For example I want to do a skb leak check 'tracing filter':
>> - attach this program to kretprobe of __alloc_skb():
>> u64 key = (u64) skb;
>> u64 value = bpf_get_time();
>> bpf_update_map_elem(1/*const_map_id*/, &key, &value);
>> - attach this program to consume_skb and kfree_skb tracepoints:
>> u64 key = (u64) skb;
>> bpf_delete_map_elem(1/*const_map_id*/, &key);
>> - and have user space do:
>> prior to loading:
>> bpf_create_map(1/*map_id*/, 8/*key_size*/, 8/*value*/, 1M /*max_entries*/)
>> and then periodically iterate the map to see whether any skb stayed
>> in the map for too long.
>>
>> Programs need to be written with hard coded map_ids otherwise usability
>> suffers, so I did global 32-bit id in this RFC
>>, but this indeed doesn't work
>
> Really? That will mean that you have to edit the source of your
> filter program if the small integer map number you chose conflicts
> with another program. That sounds unpleasant.
unpleasant. exactly. that's why I'm proposing per-process local map-id,
so that programs don't need to be edited.
>> for unprivileged chrome browser unless programs are previously loaded
>> by root and chrome only does attach to seccomp.
>>
>> So here is the non-root bpf syscall interface I'm thinking about:
>>
>> ufd = bpf_create_map(map_id, key_size, value_size, max_entries);
>>
>> it will create a global map in the system which will be accessible
>> in this process via 'ufd'. Internally this 'ufd' will be assigned global map_id
>> and process-local map_id that was passed as a 1st argument.
>> To do update/lookup the process will use bpf_map_xxx_elem(ufd,…)
>>
>
> Erk. Unprivileged programs shouldn't be able to allocate global ids
> of their choosing, especially if privileged programs can also do it.
> Otherwise unprivileged programs can force a collision and possibly
> steal information.
of course. that's not what said.
>> Then to load eBPF program the process will do:
>> ufd = bpf_prog_load(prog_type, ebpf_insn_array, license)
>> and instructions will be referring to maps via local map_id that
>> was hard coded as part of the program.
>
> I think relocations would be must prettier than per-process map id tables.
I think per process map_id are much cleaner.
non-root API:
ufd = bpf_create_map(local_map_id,… )
bpf_map_update/delete/lookup_elem(ufd,…)
ufd = bpf_prog_load(insns)
close(ufd)
root only API:
global_id = bpf_get_id(ufd) // returns either map or prog global id
bpf_map_delete(global_map_id)
bpf_prog_unload(global_prog_id)
Details:
ufd = bpf_create_map(local_map_id, ...);
local_map_id - process local map_id
(this id is used to access maps from eBPF program loaded by this process)
ufd - process local file descriptor
(used to update/lookup maps from this process)
global_map_id = bpf_get_id(ufd)
this is root only call to get global_ids and pass them to global events
like tracing.
global ids will only be seen by root. There is no way for root or non-root
to influence id ranges.
>> Beyond the normal create_map, update/lookup/delete, load_prog
>> operations (that are accessible to both root and non-root), the root user
>> gains one more operations: bpf_get_global_id(ufd) that returns
>> global map_id or prog_id. This id can be attached to global events
>> like tracing. Non-root users lose ability to do delete_map and
>> unload_prog (they do close(ufd) instead), so this ops are for root
>> only and operate on global ids.
>> This is the cleanest way I could think of to combine non-root
>> security, per-process id and global id all in one API. Thoughts?
>
> I think I'm okay with this part, although an interface to get a map fd
> given some reference to the program (in sysfs) that uses it would also
> work and maybe be more straightforward.
If you meant debugfs, then yes. I'm planning to add a way for root
to see all loaded programs and maps (similar to /proc as lsmod does),
and then do bpf_map_delete/bpf_prog_unload (similar to rmmod)
setsockopt and seccomp will be non-root and programs will go
through additional dont_leak_pointers check in verifier.
tracing/dtrace will be for root only, since they would need to attach
to global events.
I think it will be cleaner once I finish fd conversion as a patch.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists