linux-kernel - Re: eBPF / seccomp globals?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGXu5jJ4nY36M1xLXMe99YOpE1cABWBk7UchPpzz9EyW4YUAxw@mail.gmail.com>
Date:	Fri, 4 Sep 2015 13:37:59 -0700
From:	Kees Cook <keescook@...omium.org>
To:	Michael Tirado <mtirado418@...il.com>
Cc:	Network Development <netdev@...r.kernel.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Alexei Starovoitov <alexei.starovoitov@...il.com>
Subject: Re: eBPF / seccomp globals?

On Fri, Sep 4, 2015 at 1:29 PM, Michael Tirado <mtirado418@...il.com> wrote:
>> What we did in Chrome OS was to use the "minijail" tool[2] to
>> LD_PRELOAD a .so that sets up the seccomp filter after the exec. It's
>> a bit of a hack, but works in well-defined environments. You are
>> talking about namespaces, though, so maybe minijail is worth a look?
>> It does that too and a whole lot more.
>
> Minijail is pretty similar to what I have been working on the past few
> months,  unfortunately I have already written it, doh!  Those slides
> are a good resource,  definitely helpful as introduction to seccomp.
>
> So it seems there are no easy solutions to this problem. Using
> LD_PRELOAD to defer seccomp filter application scares me a little bit,
> and won't work with file capabilities IIRC, though it is a damn clever

Do you still need file capabilities with the availability of the new
ambient capabilities?

https://s3hh.wordpress.com/2015/07/25/ambient-capabilities/
http://thread.gmane.org/gmane.linux.kernel.lsm/24034

> solution.  I think for now I will explore the possibility of
> validating argument 1 of exec to allow only the program I am launching
> to be exec'd, so if somehow by Thor's hammer that program escapes it's
> sandbox, it will only be able to exec itself.  I suppose it will have
> to now be restricted to absolute paths only.

Well, you can only examine the memory address and not what's pointed
to, so you may be out of luck there too. Sorry! On the TODO list is
doing deep argument inspection, but it is not an easy thing to get
right. :)

-Kees

>
> Thanks everyone for the clarification!
>
> On Fri, Sep 4, 2015 at 4:01 AM, Kees Cook <keescook@...omium.org> wrote:
>> On Thu, Sep 3, 2015 at 6:01 PM, Michael Tirado <mtirado418@...il.com> wrote:
>>> Hiyall,
>>>
>>> I have created a seccomp white list filter for a program that launches
>>> other less trustworthy programs.  It's working great so far, but I
>>> have run into a little roadblock.  the launcher program needs to call
>>> execve as it's final step, but that may not be present in the white
>>> list.  I am wondering if there is any way to use some sort of global
>>> variable that will be preserved between syscall filter calls so that I
>>> can allow only one execve, if not present in white list by
>>> incrementing a counter variable.
>>>
>>> I see that in Documentation/networking/filter.txt one of the registers
>>> is documented as being a pointer to struct sk_buff, in the seccomp
>>> context this is a pointer to struct seccomp_data  instead, right?  and
>>> the line about callee saved registers R6-R9  probably refers to them
>>> being saved across calls within that filter, and not calls between
>>> filters?
>>>
>>> My apologies if this is not the appropriate place to ask for help, but
>>> it is difficult to find useful information on how eBPF works, and is a
>>> bit confusing trying to figure out the differences between seccomp and
>>> net filters, and the old bpf code kicking around short of spending
>>> countless hours reading through all of it.  If anybody has a some
>>> links to share I would be very grateful.  the only way I can think to
>>> make this work otherwise is to mount everything as MS_NOEXEC in the
>>> new namespace, but that just feels wrong.
>>
>> For documentation, there's some great slides on seccomp from Plumber's
>> this year[1].
>>
>> At present, there is no variable state beyond the syscall context (PC,
>> args) available to seccomp filters. The no_new_privs prctl was added
>> to reduce the risk of including execve in a filter's whitelist, but
>> that isn't as strong as the "exec once" feature you want.
>>
>> What we did in Chrome OS was to use the "minijail" tool[2] to
>> LD_PRELOAD a .so that sets up the seccomp filter after the exec. It's
>> a bit of a hack, but works in well-defined environments. You are
>> talking about namespaces, though, so maybe minijail is worth a look?
>> It does that too and a whole lot more.
>>
>> As for using maps via eBPF in seccomp, it's on the horizon, but it
>> comes with a lot exposure that I haven't finished pondering, so I
>> don't think those features will be added soon.
>>
>> -Kees
>>
>> [1] http://man7.org/conf/lpc2015/limiting_kernel_attack_surface_with_seccomp-LPC_2015-Kerrisk.pdf
>> [2] see subdirectory "minijail" after "git clone
>> https://chromium.googlesource.com/chromiumos/platform2/"
>>
>>
>> --
>> Kees Cook
>> Chrome OS Security



-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/