[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210109013739.vbqm4gllpo7g5xro@kafai-mbp.dhcp.thefacebook.com>
Date: Fri, 8 Jan 2021 17:37:39 -0800
From: Martin KaFai Lau <kafai@...com>
To: Stanislav Fomichev <sdf@...gle.com>
CC: <netdev@...r.kernel.org>, <bpf@...r.kernel.org>, <ast@...nel.org>,
<daniel@...earbox.net>, Song Liu <songliubraving@...com>,
Eric Dumazet <edumazet@...gle.com>
Subject: Re: [PATCH bpf-next v6 1/3] bpf: remove extra lock_sock for
TCP_ZEROCOPY_RECEIVE
On Fri, Jan 08, 2021 at 01:02:21PM -0800, Stanislav Fomichev wrote:
> Add custom implementation of getsockopt hook for TCP_ZEROCOPY_RECEIVE.
> We skip generic hooks for TCP_ZEROCOPY_RECEIVE and have a custom
> call in do_tcp_getsockopt using the on-stack data. This removes
> 3% overhead for locking/unlocking the socket.
>
> Without this patch:
> 3.38% 0.07% tcp_mmap [kernel.kallsyms] [k] __cgroup_bpf_run_filter_getsockopt
> |
> --3.30%--__cgroup_bpf_run_filter_getsockopt
> |
> --0.81%--__kmalloc
>
> With the patch applied:
> 0.52% 0.12% tcp_mmap [kernel.kallsyms] [k] __cgroup_bpf_run_filter_getsockopt_kern
>
> Signed-off-by: Stanislav Fomichev <sdf@...gle.com>
> Cc: Martin KaFai Lau <kafai@...com>
> Cc: Song Liu <songliubraving@...com>
> Cc: Eric Dumazet <edumazet@...gle.com>
> ---
> include/linux/bpf-cgroup.h | 27 +++++++++++--
> include/linux/indirect_call_wrapper.h | 6 +++
> include/net/sock.h | 2 +
> include/net/tcp.h | 1 +
> kernel/bpf/cgroup.c | 38 +++++++++++++++++++
> net/ipv4/tcp.c | 14 +++++++
> net/ipv4/tcp_ipv4.c | 1 +
> net/ipv6/tcp_ipv6.c | 1 +
> net/socket.c | 3 ++
> .../selftests/bpf/prog_tests/sockopt_sk.c | 22 +++++++++++
> .../testing/selftests/bpf/progs/sockopt_sk.c | 15 ++++++++
> 11 files changed, 126 insertions(+), 4 deletions(-)
>
[ ... ]
> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> index 6ec088a96302..c41bb2f34013 100644
> --- a/kernel/bpf/cgroup.c
> +++ b/kernel/bpf/cgroup.c
> @@ -1485,6 +1485,44 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
> sockopt_free_buf(&ctx);
> return ret;
> }
> +
> +int __cgroup_bpf_run_filter_getsockopt_kern(struct sock *sk, int level,
> + int optname, void *optval,
> + int *optlen, int retval)
> +{
> + struct cgroup *cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
> + struct bpf_sockopt_kern ctx = {
> + .sk = sk,
> + .level = level,
> + .optname = optname,
> + .retval = retval,
> + .optlen = *optlen,
> + .optval = optval,
> + .optval_end = optval + *optlen,
> + };
> + int ret;
> +
The current behavior only passes kernel optval to bpf prog when
retval == 0. Can you explain a few words here about
the difference and why it is fine?
Just in case some other options may want to reuse the
__cgroup_bpf_run_filter_getsockopt_kern() in the future.
> + ret = BPF_PROG_RUN_ARRAY(cgrp->bpf.effective[BPF_CGROUP_GETSOCKOPT],
> + &ctx, BPF_PROG_RUN);
> + if (!ret)
> + return -EPERM;
> +
> + if (ctx.optlen > *optlen)
> + return -EFAULT;
> +
> + /* BPF programs only allowed to set retval to 0, not some
> + * arbitrary value.
> + */
> + if (ctx.retval != 0 && ctx.retval != retval)
> + return -EFAULT;
> +
> + /* BPF programs can shrink the buffer, export the modifications.
> + */
> + if (ctx.optlen != 0)
> + *optlen = ctx.optlen;
> +
> + return ctx.retval;
> +}
> #endif
>
> static ssize_t sysctl_cpy_dir(const struct ctl_dir *dir, char **bufp,
[ ... ]
> diff --git a/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c b/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c
> index b25c9c45c148..6bb18b1d8578 100644
> --- a/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c
> +++ b/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c
> @@ -11,6 +11,7 @@ static int getsetsockopt(void)
> char u8[4];
> __u32 u32;
> char cc[16]; /* TCP_CA_NAME_MAX */
> + struct tcp_zerocopy_receive zc;
I suspect it won't compile at least in my setup.
However, I compile tools/testing/selftests/net/tcp_mmap.c fine though.
I _guess_ it is because the net's test has included kernel/usr/include.
AFAIK, bpf's tests use tools/include/uapi/.
Others LGTM.
Powered by blists - more mailing lists