netdev - Re: [net-next v2 1/2] bpf, seccomp: Add eBPF filter capabilities

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 20 Feb 2018 01:00:42 +0100
From:   Daniel Borkmann <daniel@...earbox.net>
To:     Sargun Dhillon <sargun@...gun.me>, netdev@...r.kernel.org,
        containers@...ts.linux-foundation.org
Cc:     ast@...nel.org, keescook@...omium.org, luto@...capital.net,
        wad@...omium.org, me@...sfraz.com, cpuguy83@...il.com,
        tom.hromatka@...cle.com
Subject: Re: [net-next v2 1/2] bpf, seccomp: Add eBPF filter capabilities

On 02/19/2018 05:22 PM, Sargun Dhillon wrote:
> This introduces the BPF_PROG_TYPE_SECCOMP bpf program type. It is meant
> to be used for seccomp filters as an alternative to cBPF filters. The
> program type has relatively limited capabilities in terms of helpers,
> but that can be extended later on.
> 
> The eBPF code loading is separated from attachment of the filter, so
> a privileged user can load the filter, and pass it back to an
> unprivileged user who can attach it and use it at a later time.
> 
> In order to attach the filter itself, you need to supply a flag to the
> seccomp syscall indicating that a eBPF filter is being attached, as
> opposed to a cBPF one. Verification occurs at program load time,
> so the user should only receive errors related to attachment.
> 
> Signed-off-by: Sargun Dhillon <sargun@...gun.me>
[...]
> @@ -867,7 +924,7 @@ static long seccomp_set_mode_filter(unsigned int flags,
>  
>  	spin_lock_irq(&current->sighand->siglock);
>  
> -	if (!seccomp_may_assign_mode(seccomp_mode))
> +	if (!seccomp_may_assign_mode(filter_mode))
>  		goto out;
>  
>  	ret = seccomp_attach_filter(flags, prepared);
> @@ -876,7 +933,7 @@ static long seccomp_set_mode_filter(unsigned int flags,
>  	/* Do not free the successfully attached filter. */
>  	prepared = NULL;
>  
> -	seccomp_assign_mode(current, seccomp_mode);
> +	seccomp_assign_mode(current, filter_mode);
>  out:
>  	spin_unlock_irq(&current->sighand->siglock);
>  	if (flags & SECCOMP_FILTER_FLAG_TSYNC)
> @@ -1040,8 +1097,7 @@ long seccomp_get_filter(struct task_struct *task, unsigned long filter_off,
>  	if (IS_ERR(filter))
>  		return PTR_ERR(filter);
>  
> -	fprog = filter->prog->orig_prog;
> -	if (!fprog) {
> +	if (!bpf_prog_was_classic(filter->prog)) {

This is actually a bug, see f8e529ed941b ("seccomp, ptrace: add support for
dumping seccomp filters") and would cause a NULL ptr deref in case the filter
was created with bpf_prog_create_from_user() with save_orig as false, so the
if (!fprog) test for cBPF cannot be removed from here.

>  		/* This must be a new non-cBPF filter, since we save
>  		 * every cBPF filter's orig_prog above when
>  		 * CONFIG_CHECKPOINT_RESTORE is enabled.
> @@ -1050,6 +1106,7 @@ long seccomp_get_filter(struct task_struct *task, unsigned long filter_off,
>  		goto out;
>  	}
>  
> +	fprog = filter->prog->orig_prog;
>  	ret = fprog->len;

(See above.)

>  	if (!data)
>  		goto out;
> @@ -1239,6 +1296,58 @@ static int seccomp_actions_logged_handler(struct ctl_table *ro_table, int write,
>  	return 0;
>  }
>  
> +#ifdef CONFIG_SECCOMP_FILTER_EXTENDED
> +static bool seccomp_is_valid_access(int off, int size,
> +				    enum bpf_access_type type,
> +				    struct bpf_insn_access_aux *info)
> +{
> +	if (type != BPF_READ)
> +		return false;
> +
> +	if (off < 0 || off + size > sizeof(struct seccomp_data))
> +		return false;

        if (off % size != 0)
                return false;

> +	switch (off) {
> +	case bpf_ctx_range_till(struct seccomp_data, args[0], args[5]):
> +		return (size == sizeof(__u64));
> +	case bpf_ctx_range(struct seccomp_data, nr):
> +		return (size == FIELD_SIZEOF(struct seccomp_data, nr));
> +	case bpf_ctx_range(struct seccomp_data, arch):
> +		return (size == FIELD_SIZEOF(struct seccomp_data, arch));
> +	case bpf_ctx_range(struct seccomp_data, instruction_pointer):
> +		return (size == FIELD_SIZEOF(struct seccomp_data,
> +					     instruction_pointer));

        default:
                return false;

[...]
> +static const struct bpf_func_proto *
> +seccomp_func_proto(enum bpf_func_id func_id)
> +{
> +	switch (func_id) {
> +	case BPF_FUNC_get_current_uid_gid:
> +		return &bpf_get_current_uid_gid_proto;
> +	case BPF_FUNC_ktime_get_ns:
> +		return &bpf_ktime_get_ns_proto;
> +	case BPF_FUNC_get_prandom_u32:
> +		return &bpf_get_prandom_u32_proto;
> +	case BPF_FUNC_get_current_pid_tgid:
> +		return &bpf_get_current_pid_tgid_proto;

Do you have a use-case description for the above helpers? Is the prandom/ktime
one for simulating errors coming from the syscall? And the other two for
orchestration purposes?

One use case this work could enable would be to implement state machines in BPF
for BPF-seccomp and enabling a more fine-grained / tiny subset of syscalls based
on the state the prog is in while the rest is all blocked out - as opposed to a
global white/black-list of syscalls the app can do in general. Getting to such
an app model would probably be rather challenging at least for complex apps. We'd
need some sort of scratch buffer for keeping the state for this though, e.g. either
map with single slot or per thread scratch space. Anyway, just a thought.

> +	default:
> +		return NULL;
> +	}
> +}
> +
> +const struct bpf_prog_ops seccomp_prog_ops = {
> +};
> +
> +const struct bpf_verifier_ops seccomp_verifier_ops = {
> +	.get_func_proto		= seccomp_func_proto,
> +	.is_valid_access	= seccomp_is_valid_access,
> +};
> +#endif /* CONFIG_SECCOMP_FILTER_EXTENDED */
> +
>  static struct ctl_path seccomp_sysctl_path[] = {
>  	{ .procname = "kernel", },
>  	{ .procname = "seccomp", },
>