linux-kernel - Re: [PATCH net-next 3/4] bpf: add support for persistent maps/progs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <562301F9.1030702@plumgrid.com>
Date:	Sat, 17 Oct 2015 19:20:41 -0700
From:	Alexei Starovoitov <ast@...mgrid.com>
To:	Daniel Borkmann <daniel@...earbox.net>,
	"Eric W. Biederman" <ebiederm@...ssion.com>
Cc:	Hannes Frederic Sowa <hannes@...essinduktion.org>,
	davem@...emloft.net, viro@...IV.linux.org.uk, tgraf@...g.ch,
	netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
	Alexei Starovoitov <ast@...nel.org>
Subject: Re: [PATCH net-next 3/4] bpf: add support for persistent maps/progs

On 10/17/15 5:28 AM, Daniel Borkmann wrote:
>
> Anyway, another idea I've been brainstorming with Hannes today a
> bit is about the following:
>
> We register two major numbers, one for eBPF maps (X), one for eBPF
> progs (Y). A user can either via cmdline call something like ...
> mknod /dev/bpf/maps/map_pkts c X Z to create a special character
> device, or alternatively out of an application through mknod(2)
> syscall (f.e. tc when setting up maps/progs internally from the obj
> file for a classifer).
>
> Then, we still have 2 eBPF commands for bpf(2) syscall to add, say
> (for example) BPF_BIND_DEV and BPF_FETCH_DEV. The application that
> created a map (or prog) already has the map fd and after mknod(2) it
> can open(2) the special file to get the special file fd. Then it can
> call something like bpf(BPF_BIND_DEV, &attr, sizeof(attr))) where
> attr looks like:
>
>    union bpf_attr attr = {
>      .bpf_fd    = bpf_fd,
>      .dev_fd    = dev_fd,
>    };
>
> The bpf(2) syscall can check whether dev_fd belongs to an eBPF special
> file and it can then copy over file->private_data from the bpf_fd
> to the dev_fd's underlying file, where the private_data, as we know,
> from the bpf_fd already points to a proper bpf_map/bpf_prog structure.
> The map/prog would then get ref'ed and lives onwards in the char device's
> lifetime. No special hashtable, gc, etc needed. The char device has fops
> that we can define by ourself, and unlinking would drop the ref from
> its private_data.
>
> Now to the other part: BPF_FETCH_DEV would work similar. The application
> opens the device, and fills bpf_attr as follows again:
>
>    union bpf_attr attr = {
>      .bpf_fd    = 0,
>      .dev_fd    = dev_fd,
>    };
>
> This would allow us to look up the map/prog from the dev_fd's file->
> private_data, and installs a new fd via bpf_{map,prog}_new_fd() that
> is returned from bpf(2) for bpf-related access. The remaining fops
> from the char device could still be reserved for possibilities like
> debugging in future.
>
> Now in future (2nd step), could either be to use Eric's idea and then do
> something like mount -t bpffs ... -o /dev/bpf/maps/map_pkts to dump
> attributes or other properties to some location for inspection from such
> a special file, or we could use kobjects for that attached to the device
> if the fops from the cdev should not be sufficient.
>
> So closing the loop to the special files where there were concerns:
>
> This won't forbid to have a future shell-style access possibility, and
> it would also not end up as a nightmare on what you mentioned with the
> S_ISSOCK-like bit in the other email.
>
> The pinning mechanism would not require an extra file system to be mounted
> somewhere, and yet the user can define himself an arbitrary hierarchy
> where he puts the special files as this facility already exists. An
> approach like this looks overall cleaner to me, and most likely be
> realizable in fewer lines of code as well.
>
> Thoughts?

that indeed sounds cleaner, less lines of code, no fs, etc, but
I don't see how it will work yet.
For chardev with our own ops we can be triggered on open and close
of that chardev, so replacing private_data will be cleared when
user process does close(dev_fd) ? There is no fops for unlink either,
it's fs only property ?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/