[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190608070838.4vhwss4anyibju53@kafai-mbp.dhcp.thefacebook.com>
Date: Sat, 8 Jun 2019 07:08:41 +0000
From: Martin Lau <kafai@...com>
To: Stanislav Fomichev <sdf@...gle.com>
CC: "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"bpf@...r.kernel.org" <bpf@...r.kernel.org>,
"davem@...emloft.net" <davem@...emloft.net>,
"ast@...nel.org" <ast@...nel.org>,
"daniel@...earbox.net" <daniel@...earbox.net>,
Andrii Nakryiko <andriin@...com>
Subject: Re: [PATCH bpf-next v3 1/8] bpf: implement getsockopt and setsockopt
hooks
On Fri, Jun 07, 2019 at 09:29:13AM -0700, Stanislav Fomichev wrote:
> Implement new BPF_PROG_TYPE_CGROUP_SOCKOPT program type and
> BPF_CGROUP_{G,S}ETSOCKOPT cgroup hooks.
>
> BPF_CGROUP_SETSOCKOPT get a read-only view of the setsockopt arguments.
> BPF_CGROUP_GETSOCKOPT can modify the supplied buffer.
> Both of them reuse existing PTR_TO_PACKET{,_END} infrastructure.
>
> The buffer memory is pre-allocated (because I don't think there is
> a precedent for working with __user memory from bpf). This might be
> slow to do for each {s,g}etsockopt call, that's why I've added
> __cgroup_bpf_prog_array_is_empty that exits early if there is nothing
> attached to a cgroup. Note, however, that there is a race between
> __cgroup_bpf_prog_array_is_empty and BPF_PROG_RUN_ARRAY where cgroup
> program layout might have changed; this should not be a problem
> because in general there is a race between multiple calls to
> {s,g}etsocktop and user adding/removing bpf progs from a cgroup.
>
> The return code of the BPF program is handled as follows:
> * 0: EPERM
> * 1: success, execute kernel {s,g}etsockopt path after BPF prog exits
> * 2: success, do _not_ execute kernel {s,g}etsockopt path after BPF
> prog exits
>
> v3:
> * typos in BPF_PROG_CGROUP_SOCKOPT_RUN_ARRAY comments (Andrii Nakryiko)
> * reverse christmas tree in BPF_PROG_CGROUP_SOCKOPT_RUN_ARRAY (Andrii
> Nakryiko)
> * use __bpf_md_ptr instead of __u32 for optval{,_end} (Martin Lau)
> * use BPF_FIELD_SIZEOF() for consistency (Martin Lau)
> * new CG_SOCKOPT_ACCESS macro to wrap repeated parts
>
> v2:
> * moved bpf_sockopt_kern fields around to remove a hole (Martin Lau)
> * aligned bpf_sockopt_kern->buf to 8 bytes (Martin Lau)
> * bpf_prog_array_is_empty instead of bpf_prog_array_length (Martin Lau)
> * added [0,2] return code check to verifier (Martin Lau)
> * dropped unused buf[64] from the stack (Martin Lau)
> * use PTR_TO_SOCKET for bpf_sockopt->sk (Martin Lau)
> * dropped bpf_target_off from ctx rewrites (Martin Lau)
> * use return code for kernel bypass (Martin Lau & Andrii Nakryiko)
>
> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> index 1b65ab0df457..4fc8429af6fc 100644
> --- a/kernel/bpf/cgroup.c
> +++ b/kernel/bpf/cgroup.c
[ ... ]
> +static const struct bpf_func_proto *
> +cg_sockopt_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> +{
> + switch (func_id) {
> + case BPF_FUNC_sk_fullsock:
> + return &bpf_sk_fullsock_proto;
May be my v2 comment has been missed.
sk here (i.e. PTR_TO_SOCKET) must be a fullsock.
bpf_sk_fullsock() will be a no-op. Hence, there is
no need to expose bpf_sk_fullsock_proto.
> + case BPF_FUNC_sk_storage_get:
> + return &bpf_sk_storage_get_proto;
> + case BPF_FUNC_sk_storage_delete:
> + return &bpf_sk_storage_delete_proto;
> +#ifdef CONFIG_INET
> + case BPF_FUNC_tcp_sock:
> + return &bpf_tcp_sock_proto;
> +#endif
> + default:
> + return cgroup_base_func_proto(func_id, prog);
> + }
> +}
> +
> +static bool cg_sockopt_is_valid_access(int off, int size,
> + enum bpf_access_type type,
> + const struct bpf_prog *prog,
> + struct bpf_insn_access_aux *info)
> +{
> + const int size_default = sizeof(__u32);
> +
> + if (off < 0 || off >= sizeof(struct bpf_sockopt))
> + return false;
> +
> + if (off % size != 0)
> + return false;
> +
> + if (type == BPF_WRITE) {
> + switch (off) {
> + case offsetof(struct bpf_sockopt, optlen):
> + if (size != size_default)
> + return false;
> + return prog->expected_attach_type ==
> + BPF_CGROUP_GETSOCKOPT;
> + default:
> + return false;
> + }
> + }
> +
> + switch (off) {
> + case offsetof(struct bpf_sockopt, sk):
> + if (size != sizeof(struct bpf_sock *))
Based on my understanding in commit b7df9ada9a77 ("bpf: fix pointer offsets in context for 32 bit"),
I think it should be 'size != sizeof(__u64)'
Same for the optval and optval_end below.
> + return false;
> + info->reg_type = PTR_TO_SOCKET;
> + break;
> + case bpf_ctx_range(struct bpf_sockopt, optval):
offsetof(struct bpf_sockopt, optval)
> + if (size != sizeof(void *))
> + return false;
> + info->reg_type = PTR_TO_PACKET;
> + break;
> + case bpf_ctx_range(struct bpf_sockopt, optval_end):
offsetof(struct bpf_sockopt, optval_end)
> + if (size != sizeof(void *))
> + return false;
> + info->reg_type = PTR_TO_PACKET_END;
> + break;
> + default:
> + if (size != size_default)
> + return false;
> + break;
> + }
> + return true;
> +}
> +
[ ... ]
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 55bfc941d17a..4652c0a005ca 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -1835,7 +1835,7 @@ BPF_CALL_1(bpf_sk_fullsock, struct sock *, sk)
> return sk_fullsock(sk) ? (unsigned long)sk : (unsigned long)NULL;
> }
>
> -static const struct bpf_func_proto bpf_sk_fullsock_proto = {
> +const struct bpf_func_proto bpf_sk_fullsock_proto = {
As mentioned above, this change is also not needed.
Others LGTM.
> .func = bpf_sk_fullsock,
> .gpl_only = false,
> .ret_type = RET_PTR_TO_SOCKET_OR_NULL,
> @@ -5636,7 +5636,7 @@ BPF_CALL_1(bpf_tcp_sock, struct sock *, sk)
> return (unsigned long)NULL;
> }
>
Powered by blists - more mailing lists