netdev - Re: [RFC PATCH net] bpf: introduce BPF_F_ALLOW

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 10 Feb 2017 13:38:45 -0800
From:   Andy Lutomirski <luto@...capital.net>
To:     Alexei Starovoitov <ast@...com>
Cc:     "David S . Miller" <davem@...emloft.net>,
        Daniel Borkmann <daniel@...earbox.net>,
        David Ahern <dsa@...ulusnetworks.com>,
        Daniel Mack <daniel@...que.org>, Tejun Heo <tj@...nel.org>,
        Network Development <netdev@...r.kernel.org>
Subject: Re: [RFC PATCH net] bpf: introduce BPF_F_ALLOW_OVERRIDE flag

On Thu, Feb 9, 2017 at 10:59 AM, Alexei Starovoitov <ast@...com> wrote:
> If BPF_F_ALLOW_OVERRIDE flag is used in BPF_PROG_ATTACH command
> to the given cgroup the descendent cgroup will be able to override
> effective bpf program that was inherited from this cgroup.
> By default it's not passed, therefore override is disallowed.
>
> Examples:
> 1.
> prog X attached to /A with default
> prog Y fails to attach to /A/B and /A/B/C
> Everything under /A runs prog X
>
> 2.
> prog X attached to /A with ALLOW_OVERRIDE
> prog Y attached to /A/B with default. Everything under /A/B runs prog Y

I think that, for ease of future extension, Y should also need
ALLOW_OVERRIDE.  Otherwise, when non-overridable hooks can stack,
there could be confusion as to whether Y should override something or
should stack.

> prog M attached to /A/C with default. Everything under /A/C runs prog M
> prog N fails to attach to /A/C/foo.
> prog L attached to /A/D with ALLOW_OVERRIDE.
>   Events under /A/D run prog L and can be overridden in /A/D/foo
>
> /A still runs prog X
> prog K attached to /A with ALLOW_OVERRIDE.
>   /A now runs prog K while /A/B runs prog Y and /A/C runs prog M
> prog J attached to /A with default.
>   /A now runs prog J while /A/B runs prog Y.
>   /A/B cannot be changed anymore (since parent disallows override),
>   but can be cleared. After detach /A/B will run prog J.
>
> Signed-off-by: Alexei Starovoitov <ast@...nel.org>
> ---
>
> Below are few proposals for future extensions and not definitive:
> 1.
> we can extend the behavior with a chain of non-overridable like:
> prog X attached to /A with default
> prog Y attached to /A/B with default
> The events scoped by /A/B will run program Y first and if it returns 1
> the prog X will be run. For control app there will be an illusion
> that it owns cgroup /A/B with single prog and detach from /A/B will delete
> prog Y unambiguously.
> While another control app that attached to /A also see its prog X running,
> unless prog Y filtered it out, which means (from X point of view)
> that event didn't happen.
> Attaching two programs to /A is not allowed.
> We would need to combine prog X and Y into array to avoid link list
> traversal for performance reasons, but that's an implementation detail.
>
> 2.
> we can add another flag to reverse this call order too.
> Instead of calling the progs from child to parent, do parent to child.

I think the order should depend on the hook.  Hooks for
process-initiated actions (egress, socket creation) should run
innermost first and hooks for outside actions (ingress) should be
outermost first.

>
> 3.
> we can extend the api further by adding 'attach_priority' flag as:
> prog X attach /A prio=20
> prog Y attach /A prio=10
> prog N attach /A/B prio=20
> prog M attach /A/B prio=10
> in /A/B the sequence of progs will be M -> N -> Y -> X

I haven't thought of a use for this.  Maybe there is one.

>
> prog X attach /A prio=10 and prog Y attach /A prio=10 will be disallowed,
> but attach with the same prio to different cgroups is ok.
> If attached with prio, detach must specify prio as well.
> Attach transitions:
> allow_override -> disable_override/single_prog = ok
> allow_override -> prio (multi prog at the same cgroup) = ok
> disable_override/single_prog -> prio = ok (with respect to child/parent order)
> prio -> allow_override = fail
> prio -> disable_override/single_prog = fail
>
> ***
> To summarize the key to not breaking abi is to preserve user space
> expectations. Right now (without this patch) we have progs
> overridable by any descendent. Which means that control plane
> application has to expect that something may overwrite the program.
> Hence any new flag will not break this expectation
> (overridable == control plane cannot assume that its attached
> programs will run in the hostile environment)
> and that's the main reason why I don't think we need to change anything now
> and hence this patch is an RFC.
>
> Adding 'allow_override' flag and changing the default to
> override disallowed is also fine from api extensibility point of view.
> Since for 'override disallowed' case the control plane app will
> be expecting that any processes will not override its program
> in the descendent cgroups and it will run. This would have to be preserved.
> That's why the future api extensions (like #1 above) would have to do
> the program chaining to preserve 'disallow override' flag expectations.
> So imo it's safer to keep overridable as it is today, since this flag
> adds a bit more restrictions to the future extensions
> comparing to everything overridable.
>
> Andy,
> does it all make sense?

Yes with the caveat above.

> Do you still insist on submitting this patch officially?

I'm not sure what you mean.

> or you're ok keeping it overridable for now.

I really think the default should change for 4.10.  People are going
to use this feature for sandboxing or in systemd or whatever, and that
code should keep working in newer kernels even when run in a container
that has a bpf hook set up outside it.

> Note that in the future it will not be possible to change the default,
> but 'disallow_override' flag can added at any time:
> Change the default in this patch and it can be appied for 4.12 or later.
> ---
>  include/linux/bpf-cgroup.h | 13 ++++++-------
>  include/uapi/linux/bpf.h   |  7 +++++++
>  kernel/bpf/cgroup.c        | 25 +++++++++++++++++++------
>  kernel/bpf/syscall.c       | 20 ++++++++++++++------
>  kernel/cgroup.c            |  9 +++++----
>  5 files changed, 51 insertions(+), 23 deletions(-)
>
> diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
> index 92bc89ae7e20..c970a25d2a49 100644
> --- a/include/linux/bpf-cgroup.h
> +++ b/include/linux/bpf-cgroup.h
> @@ -21,20 +21,19 @@ struct cgroup_bpf {
>          */
>         struct bpf_prog *prog[MAX_BPF_ATTACH_TYPE];
>         struct bpf_prog __rcu *effective[MAX_BPF_ATTACH_TYPE];
> +       bool disallow_override[MAX_BPF_ATTACH_TYPE];
>  };
>
>  void cgroup_bpf_put(struct cgroup *cgrp);
>  void cgroup_bpf_inherit(struct cgroup *cgrp, struct cgroup *parent);
>
> -void __cgroup_bpf_update(struct cgroup *cgrp,
> -                        struct cgroup *parent,
> -                        struct bpf_prog *prog,
> -                        enum bpf_attach_type type);
> +int __cgroup_bpf_update(struct cgroup *cgrp, struct cgroup *parent,
> +                       struct bpf_prog *prog, enum bpf_attach_type type,
> +                       bool overridable);
>
>  /* Wrapper for __cgroup_bpf_update() protected by cgroup_mutex */
> -void cgroup_bpf_update(struct cgroup *cgrp,
> -                      struct bpf_prog *prog,
> -                      enum bpf_attach_type type);
> +int cgroup_bpf_update(struct cgroup *cgrp, struct bpf_prog *prog,
> +                     enum bpf_attach_type type, bool overridable);
>
>  int __cgroup_bpf_run_filter_skb(struct sock *sk,
>                                 struct sk_buff *skb,
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index e5b8cf16cbaf..69f65b710b10 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -116,6 +116,12 @@ enum bpf_attach_type {
>
>  #define MAX_BPF_ATTACH_TYPE __MAX_BPF_ATTACH_TYPE
>
> +/* If BPF_F_ALLOW_OVERRIDE flag is used in BPF_PROG_ATTACH command
> + * to the given target_fd cgroup the descendent cgroup will be able to
> + * override effective bpf program that was inherited from this cgroup
> + */
> +#define BPF_F_ALLOW_OVERRIDE   (1U << 0)
> +
>  #define BPF_PSEUDO_MAP_FD      1
>
>  /* flags for BPF_MAP_UPDATE_ELEM command */
> @@ -171,6 +177,7 @@ union bpf_attr {
>                 __u32           target_fd;      /* container object to attach to */
>                 __u32           attach_bpf_fd;  /* eBPF program to attach */
>                 __u32           attach_type;
> +               __u32           attach_flags;
>         };
>  } __attribute__((aligned(8)));
>
> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> index a515f7b007c6..27cf8a3bc191 100644
> --- a/kernel/bpf/cgroup.c
> +++ b/kernel/bpf/cgroup.c
> @@ -52,6 +52,7 @@ void cgroup_bpf_inherit(struct cgroup *cgrp, struct cgroup *parent)
>                 e = rcu_dereference_protected(parent->bpf.effective[type],
>                                               lockdep_is_held(&cgroup_mutex));
>                 rcu_assign_pointer(cgrp->bpf.effective[type], e);
> +               cgrp->bpf.disallow_override[type] = parent->bpf.disallow_override[type];
>         }
>  }
>
> @@ -82,13 +83,22 @@ void cgroup_bpf_inherit(struct cgroup *cgrp, struct cgroup *parent)
>   *
>   * Must be called with cgroup_mutex held.
>   */
> -void __cgroup_bpf_update(struct cgroup *cgrp,
> -                        struct cgroup *parent,
> -                        struct bpf_prog *prog,
> -                        enum bpf_attach_type type)
> +int __cgroup_bpf_update(struct cgroup *cgrp, struct cgroup *parent,
> +                       struct bpf_prog *prog, enum bpf_attach_type type,
> +                       bool new_overridable)
>  {
>         struct bpf_prog *old_prog, *effective;
>         struct cgroup_subsys_state *pos;
> +       bool overridable = true;
> +
> +       if (parent)
> +               overridable = !parent->bpf.disallow_override[type];
> +
> +       if (!overridable && prog)
> +               return -EPERM;
> +
> +       if (prog)
> +               overridable = new_overridable;
>
>         old_prog = xchg(cgrp->bpf.prog + type, prog);
>
> @@ -101,11 +111,13 @@ void __cgroup_bpf_update(struct cgroup *cgrp,
>                 struct cgroup *desc = container_of(pos, struct cgroup, self);
>
>                 /* skip the subtree if the descendant has its own program */
> -               if (desc->bpf.prog[type] && desc != cgrp)
> +               if (desc->bpf.prog[type] && desc != cgrp) {
>                         pos = css_rightmost_descendant(pos);
> -               else
> +               } else {
>                         rcu_assign_pointer(desc->bpf.effective[type],
>                                            effective);
> +                       desc->bpf.disallow_override[type] = !overridable;
> +               }
>         }
>
>         if (prog)
> @@ -115,6 +127,7 @@ void __cgroup_bpf_update(struct cgroup *cgrp,
>                 bpf_prog_put(old_prog);
>                 static_branch_dec(&cgroup_bpf_enabled_key);
>         }
> +       return 0;
>  }
>
>  /**
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 19b6129eab23..bbb016adbaeb 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -920,13 +920,14 @@ static int bpf_obj_get(const union bpf_attr *attr)
>
>  #ifdef CONFIG_CGROUP_BPF
>
> -#define BPF_PROG_ATTACH_LAST_FIELD attach_type
> +#define BPF_PROG_ATTACH_LAST_FIELD attach_flags
>
>  static int bpf_prog_attach(const union bpf_attr *attr)
>  {
> +       enum bpf_prog_type ptype;
>         struct bpf_prog *prog;
>         struct cgroup *cgrp;
> -       enum bpf_prog_type ptype;
> +       int ret;
>
>         if (!capable(CAP_NET_ADMIN))
>                 return -EPERM;
> @@ -934,6 +935,9 @@ static int bpf_prog_attach(const union bpf_attr *attr)
>         if (CHECK_ATTR(BPF_PROG_ATTACH))
>                 return -EINVAL;
>
> +       if (attr->attach_flags & ~BPF_F_ALLOW_OVERRIDE)
> +               return -EINVAL;
> +
>         switch (attr->attach_type) {
>         case BPF_CGROUP_INET_INGRESS:
>         case BPF_CGROUP_INET_EGRESS:
> @@ -956,10 +960,13 @@ static int bpf_prog_attach(const union bpf_attr *attr)
>                 return PTR_ERR(cgrp);
>         }
>
> -       cgroup_bpf_update(cgrp, prog, attr->attach_type);
> +       ret = cgroup_bpf_update(cgrp, prog, attr->attach_type,
> +                               attr->attach_flags & BPF_F_ALLOW_OVERRIDE);
> +       if (ret)
> +               bpf_prog_put(prog);
>         cgroup_put(cgrp);
>
> -       return 0;
> +       return ret;
>  }
>
>  #define BPF_PROG_DETACH_LAST_FIELD attach_type
> @@ -967,6 +974,7 @@ static int bpf_prog_attach(const union bpf_attr *attr)
>  static int bpf_prog_detach(const union bpf_attr *attr)
>  {
>         struct cgroup *cgrp;
> +       int ret;
>
>         if (!capable(CAP_NET_ADMIN))
>                 return -EPERM;
> @@ -982,7 +990,7 @@ static int bpf_prog_detach(const union bpf_attr *attr)
>                 if (IS_ERR(cgrp))
>                         return PTR_ERR(cgrp);
>
> -               cgroup_bpf_update(cgrp, NULL, attr->attach_type);
> +               ret = cgroup_bpf_update(cgrp, NULL, attr->attach_type, false);
>                 cgroup_put(cgrp);
>                 break;
>
> @@ -990,7 +998,7 @@ static int bpf_prog_detach(const union bpf_attr *attr)
>                 return -EINVAL;
>         }
>
> -       return 0;
> +       return ret;
>  }
>  #endif /* CONFIG_CGROUP_BPF */
>
> diff --git a/kernel/cgroup.c b/kernel/cgroup.c
> index 688dd02af985..53bbca7c4859 100644
> --- a/kernel/cgroup.c
> +++ b/kernel/cgroup.c
> @@ -6498,15 +6498,16 @@ static __init int cgroup_namespaces_init(void)
>  subsys_initcall(cgroup_namespaces_init);
>
>  #ifdef CONFIG_CGROUP_BPF
> -void cgroup_bpf_update(struct cgroup *cgrp,
> -                      struct bpf_prog *prog,
> -                      enum bpf_attach_type type)
> +int cgroup_bpf_update(struct cgroup *cgrp, struct bpf_prog *prog,
> +                     enum bpf_attach_type type, bool overridable)
>  {
>         struct cgroup *parent = cgroup_parent(cgrp);
> +       int ret;
>
>         mutex_lock(&cgroup_mutex);
> -       __cgroup_bpf_update(cgrp, parent, prog, type);
> +       ret = __cgroup_bpf_update(cgrp, parent, prog, type, overridable);
>         mutex_unlock(&cgroup_mutex);
> +       return ret;
>  }
>  #endif /* CONFIG_CGROUP_BPF */
>
> --
> 2.8.0
>



-- 
Andy Lutomirski
AMA Capital Management, LLC