[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160817182027.GD98226@ast-mbp.thefacebook.com>
Date: Wed, 17 Aug 2016 11:20:29 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Daniel Mack <daniel@...que.org>
Cc: htejun@...com, daniel@...earbox.net, ast@...com,
davem@...emloft.net, kafai@...com, fw@...len.de,
pablo@...filter.org, harald@...hat.com, netdev@...r.kernel.org
Subject: Re: [RFC PATCH 4/5] net: filter: run cgroup eBPF programs
On Wed, Aug 17, 2016 at 04:00:47PM +0200, Daniel Mack wrote:
> If CONFIG_CGROUP_BPF is enabled, and the cgroup associated with the
> receiving socket has an eBPF programs installed, run them from
> sk_filter_trim_cap().
>
> eBPF programs used in this context are expected to either return 1 to
> let the packet pass, or != 1 to drop them. The programs have access to
> the full skb, including the MAC headers.
>
> This patch only implements the call site for ingress packets.
>
> Signed-off-by: Daniel Mack <daniel@...que.org>
> ---
> net/core/filter.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 44 insertions(+)
>
> diff --git a/net/core/filter.c b/net/core/filter.c
> index c5d8332..a1dd94b 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -52,6 +52,44 @@
> #include <net/dst.h>
> #include <net/sock_reuseport.h>
>
> +#ifdef CONFIG_CGROUP_BPF
> +static int sk_filter_cgroup_bpf(struct sock *sk, struct sk_buff *skb,
> + enum bpf_attach_type type)
> +{
> + struct sock_cgroup_data *skcd = &sk->sk_cgrp_data;
> + struct cgroup *cgrp = sock_cgroup_ptr(skcd);
> + struct bpf_prog *prog;
> + int ret = 0;
> +
> + rcu_read_lock();
> +
> + switch (type) {
> + case BPF_ATTACH_TYPE_CGROUP_EGRESS:
> + prog = rcu_dereference(cgrp->bpf_egress);
> + break;
> + case BPF_ATTACH_TYPE_CGROUP_INGRESS:
> + prog = rcu_dereference(cgrp->bpf_ingress);
> + break;
> + default:
> + WARN_ON_ONCE(1);
> + ret = -EINVAL;
> + break;
> + }
> +
> + if (prog) {
I really like how in this version of the patches it became
a single load+cmp of per-packet cost when this feature is off.
Please move
+ struct cgroup *cgrp = sock_cgroup_ptr(skcd);
into if (prog) {..}
to make sure it's actually single load.
The compiler cannot avoid that load when it's placed at the top.
> + unsigned int offset = skb->data - skb_mac_header(skb);
> +
> + __skb_push(skb, offset);
> + ret = bpf_prog_run_clear_cb(prog, skb) > 0 ? 0 : -EPERM;
that doesn't match commit log. The above '> 0' makes sense to me though.
If we want to do it for 1 only we have to define it in uapi/bpf.h
as action code, so we can extend to 2, 3 in the future if necessary.
It also have to be bpf_prog_run_save_cb() (as sk_filter_trim_cap() does)
instead of bpf_prog_run_clear_cb().
See commit ff936a04e5f2 ("bpf: fix cb access in socket filter programs")
> + __skb_pull(skb, offset);
> + }
> +
> + rcu_read_unlock();
> +
> + return ret;
> +}
> +#endif /* !CONFIG_CGROUP_BPF */
> +
> /**
> * sk_filter_trim_cap - run a packet through a socket filter
> * @sk: sock associated with &sk_buff
> @@ -78,6 +116,12 @@ int sk_filter_trim_cap(struct sock *sk, struct sk_buff *skb, unsigned int cap)
> if (skb_pfmemalloc(skb) && !sock_flag(sk, SOCK_MEMALLOC))
> return -ENOMEM;
>
> +#ifdef CONFIG_CGROUP_BPF
> + err = sk_filter_cgroup_bpf(sk, skb, BPF_ATTACH_TYPE_CGROUP_INGRESS);
> + if (err)
> + return err;
> +#endif
> +
> err = security_sock_rcv_skb(sk, skb);
> if (err)
> return err;
> --
> 2.5.5
>
Powered by blists - more mailing lists