lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <56252A43.3000706@iogearbox.net>
Date:	Mon, 19 Oct 2015 19:37:07 +0200
From:	Daniel Borkmann <daniel@...earbox.net>
To:	Alexei Starovoitov <ast@...mgrid.com>,
	Hannes Frederic Sowa <hannes@...essinduktion.org>,
	"Eric W. Biederman" <ebiederm@...ssion.com>
CC:	davem@...emloft.net, viro@...IV.linux.org.uk, tgraf@...g.ch,
	netdev@...r.kernel.org, linux-kernel@...r.kernel.org,
	Alexei Starovoitov <ast@...nel.org>
Subject: Re: [PATCH net-next 3/4] bpf: add support for persistent maps/progs

On 10/19/2015 06:22 PM, Alexei Starovoitov wrote:
> On 10/19/15 7:23 AM, Daniel Borkmann wrote:
>>>> The mknod is not the holder but rather the kobject which should be
>>>> represented in sysfs will be. So you can still get the map major:minor
>>>> by looking up the /dev file in the correspdonding sysfs directory or I
>>>> think we should provide a 'unbind' file, which will drop the kobject if
>>>> the user writes a '1' to it.
>>>
>>> I agree, this could still be done.
>
> imo doing 'rm' is way cleaner then dealing with 'unbind' file.

Hmm, not sure, maybe this was misunderstood. It's not about files, but
rather devices. Devices are decoupled.

This unbind file is optional and could live under /sys/class/bpf/bpf_{map,
prog}<X>/unbind for a device release. It's not strictly necessary for this
to work, though, the management is, as explained, via bpf() syscall.

>> As Hannes said, under /sys/class/bpf/ an admin can see all held nodes, so
>> visibility is there for free at all times. The device management (creation/
>> deletion) itself and the mknod's pointing to it are simply decoupled.
>>
>> This whole approach looks sound to me, also integrates nicely into the
>> existing Linux facilities, and works on top of every fs supporting special
>> files. Much cleaner than an extra file-system that would be required by a
>> syscall in order to make the syscall work.
>
> thanks for the explanations. I think I got a complete picture now on
> how such cdev will be used and I don't like it.
> There is nothing in linux or any unix that creates thousands of cdevs
> on the fly, but here user apps will create/destroy them back and forth
> and they would need to do it quickly. Whole sysfs/kobj baggage is

Well, you are talking about thousand maps and even root can create about
5 maps and then will get an -EPERM. ;) Until an admin will figure out over
couple of corners that ulimit -l needs to be adjusted ... ;)

But more serious, can you elaborate what you mean?

An eBPF program or map loading/destruction is *not* by any means to be
considered fast-path. We currently hold a global mutex during loading.
So, how can that be considered fast-path? Similarly, socket creation/
destruction is also not fast-path, etc. Do you expect that applications
would create/destroy these devices within milliseconds? I'd argue that
something would be seriously wrong with that application, then. Such
persistent maps are to be considered rather mid-long living objects in
the system. The fast-path surely is the data-path of them.

> completely unnecessary here. The kernel will consume more memory for
> no real reason other than cdev are used to keep prog/maps around.

I don't consider this a big issue, and well worth the trade-off. You'll
have an infrastructure that integrates *nicely* into the *existing* kernel
model *and* tooling with the proposed patch. This is a HUGE plus. The
UAPI of this is simple and minimal. And to me, these are in-fact special
files, not regular ones.

> imo fs is cleaner and we can tailor it to be similar to cdev style.

Really, IMHO I think this is over-designed, and much much more hacky. We
design a whole new file system that works *exactly* like cdevs, takes
likely more than twice the code and complexity to realize but just to
save a few bytes ...? I don't understand that.

Cheers,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ