lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <5717520A.5060800@iogearbox.net>
Date:	Wed, 20 Apr 2016 11:55:22 +0200
From:	Daniel Borkmann <daniel@...earbox.net>
To:	Quentin Monnet <quentin.monnet@...nd.com>
CC:	Alexei Starovoitov <alexei.starovoitov@...il.com>,
	netdev@...r.kernel.org
Subject: Re: [PATCH net-next 0/2] act_bpf, cls_bpf: send eBPF bytecode through

Hi Quentin,

On 04/20/2016 09:25 AM, Quentin Monnet wrote:
> 2016-04-15 (11:44 UTC-0700) ~ Alexei Starovoitov:
>> On Fri, Apr 15, 2016 at 12:41:05PM +0200, Daniel Borkmann wrote:
>>> On 04/15/2016 12:07 PM, Quentin Monnet wrote:
>>>> When a new BPF traffic control filter or action is set up with tc, the
>>>> bytecode is sent back to userspace through a netlink socket for cBPF, but
>>>> not for eBPF (the file descriptor pointing to the object file containing
>>>> the bytecode is sent instead).
>>>>
>>>> This patch makes cls_bpf and act_bpf modules send the bytecode for eBPF as
>>>> well (in addition to the file descriptor).
>>>>
> […]
>>>
>>> Thanks for working on this, but it's unfortunately not that easy. Let
>>> me ask, what would be the intended use-case to dump the insns?
>>
>> +1
>>
>>> I'm asking because if you dump them as-is, then a reinject at a later
>>> time of that bytecode back into the kernel will most likely be rejected
>>> by the verifier.
>>>
>>> This is because on load time, verifier does rewrites/expansion on some
>>> of the insns (f.e. map pointers, helper functions, ctx access etc, see
>>> also appendix in [1]), so the code as seen in the kernel would need to
>>> be sanitized first.
>>
>> +1
>> we had similar discussion about this in seccomp context and decided that
>> the only sensible way is to keep original instructions, but it's wasteful
>> to do unconditionally and snapshotting of maps is not possible,
>> so there was no use for such dumping facility other than debugging.
>> Is it what the patch after?
>> We need to discuss it in the proper context.
>
> I am experimenting with BPF, and so far I was just trying to dump the
> bytecode sent from tc to the kernel. I had not realized that the
> verifier would bring some changes to the instructions. And I agree that
> a more comprehensive debugging solution could be obtained if I can find
> some way to get a snapshot of the maps.
>
>>> Also, how would you make sense/transform maps into a meaningful
>>> representation (probably possible to find a scheme when they are pinned)?
>>>
>>> Another possibility is that such programs need to be pinned (can be done
>>> easily by tc in the background) and then implement a CRIU facility into
>>> the bpf(2) syscall to retrieve them. tc could make use of this w/o too
>>> much effort, and at the same time it would help CRIU folks, too. It
>>> also seems cleaner to have only one central api (bpf(2)) to dump them,
>>> but needs a bit of thought.
>>
>> +1
>> any debugging or criu needs to be done in a centralized way via syscall
>> and/or bpffs.
>
> Maintaining a central API around bpf() makes sense to me. I have been
> looking at the BPF filesystem to see what information I can obtain from
> it, but I did not understand it well. I read the logs of Daniel's commit
> b2197755b263 (“bpf: add support for persistent maps/progs”), but I am
> unsure how I could use it in order to gather data about the maps and
> programs (if this is possible at all). I tried to set up some BPF

Currently, there's not yet much information to extract. F.e. if you look at
the tc source code, we do bpf_map_selfcheck_pinned() from fdinfo to check if
the map fd that we got from the pinned one fits to the one from the object
file. But obviously more work is needed for extraction of bytecode as in your
case.

Haven't thought much about it yet, but one idea could be that tc also pins
programs, then sends some kind of annotation down to cls_bpf where on filter
dump tc could retrieve the path to the pinned program again, then uses bpf(2)
with BPF_OBJ_GET to get the fd, and a new command e.g. BPF_PROG_DUMP to extract
bytecode/map info from the running program and dumps it to the user in a way
where some sense can be made out of it from admin/user perspective (in other
words, not just raw opcodes I mean).

BPF_PROG_DUMP could have auxiliary information with map specs, kind of in a
similar way like we retrieve them as relo entries from the object file in
the loader, and in addition some information where to retrieve the maps in
case they were pinned. This still doesn't give you a entire snapshot of the
map, but would at least allow you for the pinned ones to iterate over them
via bpf(2) with BPF_MAP_GET_NEXT_KEY, plus in general it would allow you to
reload the program.

There's still the issue with the additional memory overhead to keep original
insns around as Alexei mentioned. Two things that come to mind, one being
that when JITing was successful, we could actually try to shrink struct bpf_prog
again since we work on a different image, but it doesn't address the case
where JIT is not used. Other one being to perhaps only keep a 'diff' around
in orig_prog where we can patch insns back to original, probably possible,
but needs a bit of work though.

> filters working with maps, but I could not find any file under
> /sys/fs/bpf/tc.

There are some getting started examples under examples/bpf/ in the iproute2
repo, f.e. bpf_shared.c is one.

> Would you have a pointer to some documentation about this filesystem? Or
> is there only the kernel code?

Yeah, b2197755b263 and 42984d7c1e56, and in my netdev1.1 paper I tried to put
more extensive information, but seems the proceedings haven't been published
yet. I can send you a private copy until they are officially released I guess.

Thanks,
Daniel

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ