[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKH8qBscw4NkOavRZ2nDiB7Yz_BbO5nLwmczkMraMYgrDWWxGg@mail.gmail.com>
Date: Mon, 11 Jan 2021 10:50:14 -0800
From: Stanislav Fomichev <sdf@...gle.com>
To: Martin KaFai Lau <kafai@...com>
Cc: Netdev <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Song Liu <songliubraving@...com>,
Eric Dumazet <edumazet@...gle.com>
Subject: Re: [PATCH bpf-next v6 1/3] bpf: remove extra lock_sock for TCP_ZEROCOPY_RECEIVE
On Fri, Jan 8, 2021 at 5:37 PM Martin KaFai Lau <kafai@...com> wrote:
>
> On Fri, Jan 08, 2021 at 01:02:21PM -0800, Stanislav Fomichev wrote:
> > Add custom implementation of getsockopt hook for TCP_ZEROCOPY_RECEIVE.
> > We skip generic hooks for TCP_ZEROCOPY_RECEIVE and have a custom
> > call in do_tcp_getsockopt using the on-stack data. This removes
> > 3% overhead for locking/unlocking the socket.
> >
> > Without this patch:
> > 3.38% 0.07% tcp_mmap [kernel.kallsyms] [k] __cgroup_bpf_run_filter_getsockopt
> > |
> > --3.30%--__cgroup_bpf_run_filter_getsockopt
> > |
> > --0.81%--__kmalloc
> >
> > With the patch applied:
> > 0.52% 0.12% tcp_mmap [kernel.kallsyms] [k] __cgroup_bpf_run_filter_getsockopt_kern
> >
> > Signed-off-by: Stanislav Fomichev <sdf@...gle.com>
> > Cc: Martin KaFai Lau <kafai@...com>
> > Cc: Song Liu <songliubraving@...com>
> > Cc: Eric Dumazet <edumazet@...gle.com>
> > ---
> > include/linux/bpf-cgroup.h | 27 +++++++++++--
> > include/linux/indirect_call_wrapper.h | 6 +++
> > include/net/sock.h | 2 +
> > include/net/tcp.h | 1 +
> > kernel/bpf/cgroup.c | 38 +++++++++++++++++++
> > net/ipv4/tcp.c | 14 +++++++
> > net/ipv4/tcp_ipv4.c | 1 +
> > net/ipv6/tcp_ipv6.c | 1 +
> > net/socket.c | 3 ++
> > .../selftests/bpf/prog_tests/sockopt_sk.c | 22 +++++++++++
> > .../testing/selftests/bpf/progs/sockopt_sk.c | 15 ++++++++
> > 11 files changed, 126 insertions(+), 4 deletions(-)
> >
> [ ... ]
>
> > diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
> > index 6ec088a96302..c41bb2f34013 100644
> > --- a/kernel/bpf/cgroup.c
> > +++ b/kernel/bpf/cgroup.c
> > @@ -1485,6 +1485,44 @@ int __cgroup_bpf_run_filter_getsockopt(struct sock *sk, int level,
> > sockopt_free_buf(&ctx);
> > return ret;
> > }
> > +
> > +int __cgroup_bpf_run_filter_getsockopt_kern(struct sock *sk, int level,
> > + int optname, void *optval,
> > + int *optlen, int retval)
> > +{
> > + struct cgroup *cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data);
> > + struct bpf_sockopt_kern ctx = {
> > + .sk = sk,
> > + .level = level,
> > + .optname = optname,
> > + .retval = retval,
> > + .optlen = *optlen,
> > + .optval = optval,
> > + .optval_end = optval + *optlen,
> > + };
> > + int ret;
> > +
> The current behavior only passes kernel optval to bpf prog when
> retval == 0. Can you explain a few words here about
> the difference and why it is fine?
> Just in case some other options may want to reuse the
> __cgroup_bpf_run_filter_getsockopt_kern() in the future.
IIRC, whatever we do in __cgroup_bpf_run_filter_getsockopt
with skipping the copy for retval != 0 is just an optimization.
I was assuming that on the error, kernel wouldn't copy
anything back to the users (not sure how true in real
life it is). I'll add a comment here to signify the difference.
> > + ret = BPF_PROG_RUN_ARRAY(cgrp->bpf.effective[BPF_CGROUP_GETSOCKOPT],
> > + &ctx, BPF_PROG_RUN);
> > + if (!ret)
> > + return -EPERM;
> > +
> > + if (ctx.optlen > *optlen)
> > + return -EFAULT;
> > +
> > + /* BPF programs only allowed to set retval to 0, not some
> > + * arbitrary value.
> > + */
> > + if (ctx.retval != 0 && ctx.retval != retval)
> > + return -EFAULT;
> > +
> > + /* BPF programs can shrink the buffer, export the modifications.
> > + */
> > + if (ctx.optlen != 0)
> > + *optlen = ctx.optlen;
> > +
> > + return ctx.retval;
> > +}
> > #endif
> >
> > static ssize_t sysctl_cpy_dir(const struct ctl_dir *dir, char **bufp,
>
> [ ... ]
>
> > diff --git a/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c b/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c
> > index b25c9c45c148..6bb18b1d8578 100644
> > --- a/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c
> > +++ b/tools/testing/selftests/bpf/prog_tests/sockopt_sk.c
> > @@ -11,6 +11,7 @@ static int getsetsockopt(void)
> > char u8[4];
> > __u32 u32;
> > char cc[16]; /* TCP_CA_NAME_MAX */
> > + struct tcp_zerocopy_receive zc;
> I suspect it won't compile at least in my setup.
>
> However, I compile tools/testing/selftests/net/tcp_mmap.c fine though.
> I _guess_ it is because the net's test has included kernel/usr/include.
>
> AFAIK, bpf's tests use tools/include/uapi/.
>
> Others LGTM.
Sure, let me add export it to tools/include/uapi. I didn't do it
because it also compiled for me and I assumed that
tcp_zerocopy_receive was exported too long ago to care (we are using
the first field anyway so don't really need the latest layout).
Powered by blists - more mailing lists