netdev - Re: [PATCH RFC net-next 03/14] bpf: introduce syscall(BPF, ...) and BPF maps

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMEtUuzkVANM341xPTKH1bNVNuK6TQcyqsdZdkGWausLT5Qj6A@mail.gmail.com>
Date:	Wed, 2 Jul 2014 19:29:47 -0700
From:	Alexei Starovoitov <ast@...mgrid.com>
To:	Andy Lutomirski <luto@...capital.net>
Cc:	"David S. Miller" <davem@...emloft.net>,
	Ingo Molnar <mingo@...nel.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Daniel Borkmann <dborkman@...hat.com>,
	Chema Gonzalez <chema@...gle.com>,
	Eric Dumazet <edumazet@...gle.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Arnaldo Carvalho de Melo <acme@...radead.org>,
	Jiri Olsa <jolsa@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	"H. Peter Anvin" <hpa@...or.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Kees Cook <keescook@...omium.org>,
	Linux API <linux-api@...r.kernel.org>,
	Network Development <netdev@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH RFC net-next 03/14] bpf: introduce syscall(BPF, ...) and
 BPF maps

On Wed, Jul 2, 2014 at 6:43 PM, Andy Lutomirski <luto@...capital.net> wrote:
> On Tue, Jul 1, 2014 at 10:33 PM, Alexei Starovoitov <ast@...mgrid.com> wrote:
>> I want to avoid string names, since they will force new 'strtab', 'symtab'
>> sections in the programs/maps and will uglify the user interface quite a bit.
>
> To be fair, you really need to imitate ELF here.  A very simple
> relocation-like table should do the trick.

simple.. right :) I do see the amount of struggle you have with
binutils and vdso.
I really don't want to add relocation unless this is last resort.
Especially since it can be solved without it.
I don't think I explained it enough in my last email… trying again:

>> Back in september one loadable unit was: one eBPF program + set of maps,
>> but tracing requirements forced a change, since multiple programs need
>> to access the same map and maps may need to be pre-populated before
>> the programs start executing, so I've split maps and programs into mostly
>> independent entities, but programs still need to think of maps as local:
>> For example I want to do a skb leak check 'tracing filter':
>> - attach this program to kretprobe of __alloc_skb():
>>   u64 key = (u64) skb;
>>   u64 value = bpf_get_time();
>>   bpf_update_map_elem(1/*const_map_id*/, &key, &value);
>> - attach this program to consume_skb and kfree_skb tracepoints:
>>   u64 key = (u64) skb;
>>   bpf_delete_map_elem(1/*const_map_id*/, &key);
>> - and have user space do:
>>   prior to loading:
>>   bpf_create_map(1/*map_id*/, 8/*key_size*/, 8/*value*/, 1M /*max_entries*/)
>>   and then periodically iterate the map to see whether any skb stayed
>>   in the map for too long.
>>
>> Programs need to be written with hard coded map_ids otherwise usability
>> suffers, so I did global 32-bit id in this RFC
>>, but this indeed doesn't work
>
> Really?  That will mean that you have to edit the source of your
> filter program if the small integer map number you chose conflicts
> with another program.  That sounds unpleasant.

unpleasant. exactly. that's why I'm proposing per-process local map-id,
so that programs don't need to be edited.

>> for unprivileged chrome browser unless programs are previously loaded
>> by root and chrome only does attach to seccomp.
>>
>> So here is the non-root bpf syscall interface I'm thinking about:
>>
>> ufd = bpf_create_map(map_id, key_size, value_size, max_entries);
>>
>> it will create a global map in the system which will be accessible
>> in this process via 'ufd'. Internally this 'ufd' will be assigned global map_id
>> and process-local map_id that was passed as a 1st argument.
>> To do update/lookup the process will use bpf_map_xxx_elem(ufd,…)
>>
>
> Erk.  Unprivileged programs shouldn't be able to allocate global ids
> of their choosing, especially if privileged programs can also do it.
> Otherwise unprivileged programs can force a collision and possibly
> steal information.

of course. that's not what said.

>> Then to load eBPF program the process will do:
>> ufd = bpf_prog_load(prog_type, ebpf_insn_array, license)
>> and instructions will be referring to maps via local map_id that
>> was hard coded as part of the program.
>
> I think relocations would be must prettier than per-process map id tables.

I think per process map_id are much cleaner.

non-root API:

ufd = bpf_create_map(local_map_id,… )
bpf_map_update/delete/lookup_elem(ufd,…)
ufd = bpf_prog_load(insns)
close(ufd)

root only API:

global_id = bpf_get_id(ufd) // returns either map or prog global id
bpf_map_delete(global_map_id)
bpf_prog_unload(global_prog_id)

Details:

ufd = bpf_create_map(local_map_id, ...);

local_map_id - process local map_id
(this id is used to access maps from eBPF program loaded by this process)

ufd - process local file descriptor
(used to update/lookup maps from this process)

global_map_id = bpf_get_id(ufd)
this is root only call to get global_ids and pass them to global events
like tracing.

global ids will only be seen by root. There is no way for root or non-root
to influence id ranges.

>> Beyond the normal create_map, update/lookup/delete, load_prog
>> operations (that are accessible to both root and non-root), the root user
>> gains one more operations: bpf_get_global_id(ufd) that returns
>> global map_id or prog_id. This id can be attached to global events
>> like tracing. Non-root users lose ability to do delete_map and
>> unload_prog (they do close(ufd) instead), so this ops are for root
>> only and operate on global ids.
>> This is the cleanest way I could think of to combine non-root
>> security, per-process id and global id all in one API. Thoughts?
>
> I think I'm okay with this part, although an interface to get a map fd
> given some reference to the program (in sysfs) that uses it would also
> work and maybe be more straightforward.

If you meant debugfs, then yes. I'm planning to add a way for root
to see all loaded programs and maps (similar to /proc as lsmod does),
and then do bpf_map_delete/bpf_prog_unload (similar to rmmod)

setsockopt and seccomp will be non-root and programs will go
through additional dont_leak_pointers check in verifier.
tracing/dtrace will be for root only, since they would need to attach
to global events.

I think it will be cleaner once I finish fd conversion as a patch.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html