[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CANn89iJvGjtVD02MZPKwCkZ5raBDG4_SpejBGtdvSJ63XrqPww@mail.gmail.com>
Date: Fri, 8 Jan 2021 21:08:03 +0100
From: Eric Dumazet <edumazet@...gle.com>
To: Stanislav Fomichev <sdf@...gle.com>
Cc: netdev <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>,
Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
Martin KaFai Lau <kafai@...com>,
Song Liu <songliubraving@...com>,
Brian Vazquez <brianvv@...gle.com>
Subject: Re: [PATCH bpf-next v5 1/3] bpf: remove extra lock_sock for TCP_ZEROCOPY_RECEIVE
On Fri, Jan 8, 2021 at 8:27 PM Stanislav Fomichev <sdf@...gle.com> wrote:
>
> On Fri, Jan 8, 2021 at 11:23 AM Eric Dumazet <edumazet@...gle.com> wrote:
> >
> > On Fri, Jan 8, 2021 at 8:08 PM Stanislav Fomichev <sdf@...gle.com> wrote:
> > >
> > > On Fri, Jan 8, 2021 at 10:41 AM Eric Dumazet <edumazet@...gle.com> wrote:
> > > >
> > > > On Fri, Jan 8, 2021 at 7:26 PM Stanislav Fomichev <sdf@...gle.com> wrote:
> > > > >
> > > > > On Fri, Jan 8, 2021 at 10:10 AM Eric Dumazet <edumazet@...gle.com> wrote:
> > > > > >
> > > > > > On Fri, Jan 8, 2021 at 7:03 PM Stanislav Fomichev <sdf@...gle.com> wrote:
> > > > > > >
> > > > > > > Add custom implementation of getsockopt hook for TCP_ZEROCOPY_RECEIVE.
> > > > > > > We skip generic hooks for TCP_ZEROCOPY_RECEIVE and have a custom
> > > > > > > call in do_tcp_getsockopt using the on-stack data. This removes
> > > > > > > 3% overhead for locking/unlocking the socket.
> > > > > > >
> > > > > > > Without this patch:
> > > > > > > 3.38% 0.07% tcp_mmap [kernel.kallsyms] [k] __cgroup_bpf_run_filter_getsockopt
> > > > > > > |
> > > > > > > --3.30%--__cgroup_bpf_run_filter_getsockopt
> > > > > > > |
> > > > > > > --0.81%--__kmalloc
> > > > > > >
> > > > > > > With the patch applied:
> > > > > > > 0.52% 0.12% tcp_mmap [kernel.kallsyms] [k] __cgroup_bpf_run_filter_getsockopt_kern
> > > > > > >
> > > > > >
> > > > > >
> > > > > > OK but we are adding yet another indirect call.
> > > > > >
> > > > > > Can you add a patch on top of it adding INDIRECT_CALL_INET() avoidance ?
> > > > > Sure, but do you think it will bring any benefit?
> > > >
> > > > Sure, avoiding an indirect call might be the same gain than the
> > > > lock_sock() avoidance :)
> > > >
> > > > > We don't have any indirect avoidance in __sys_getsockopt for the
> > > > > sock->ops->getsockopt() call.
> > > > > If we add it for this new bpf_bypass_getsockopt, we might as well add
> > > > > it for sock->ops->getsockopt?
> > > >
> > > > Well, that is orthogonal to this patch.
> > > > As you may know, Google kernels do have a mitigation there already and
> > > > Brian may upstream it.
> > > I guess my point here was that if I send it out only for bpf_bypass_getsockopt
> > > it might look a bit strange because the rest of the getsockopt still
> > > suffers the indirect costs.
> >
> >
> > Each new indirect call adds a cost. If you focus on optimizing
> > TCP_ZEROCOPY_RECEIVE,
> > it is counter intuitive adding an expensive indirect call.
> Ok, then let me resend with a mitigation in place and a note
> that the rest will be added later.
>
> > If Brian has plans to upstream the rest, maybe
> > > it's better to upstream everything together with some numbers?
> > > CC'ing him for his opinion.
> >
> > I am just saying your point about the other indirect call is already taken care.
> >
> > >
> > > I'm happy to follow up in whatever form is best. I can also resend
> > > with INDIRECT_CALL_INET2 if there are no objections in including
> > > this version from the start.
> > >
> >
> > INDIRECT_CALL_INET2 seems a strange name to me.
> Any suggestion for a better name? I did play with the following:
> diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
> index cbba9c9ab073..f7342a30284c 100644
> --- a/include/linux/bpf-cgroup.h
> +++ b/include/linux/bpf-cgroup.h
> @@ -371,7 +371,9 @@ int bpf_percpu_cgroup_storage_update(struct
> bpf_map *map, void *key,
> int __ret = retval; \
> if (cgroup_bpf_enabled(BPF_CGROUP_GETSOCKOPT)) \
> if (!(sock)->sk_prot->bpf_bypass_getsockopt || \
> - !(sock)->sk_prot->bpf_bypass_getsockopt(level, optname)) \
> +
> !INDIRECT_CALL_INET1((sock)->sk_prot->bpf_bypass_getsockopt, \
> + tcp_bpf_bypass_getsockopt, \
> + level, optname)) \
> __ret = __cgroup_bpf_run_filter_getsockopt( \
> sock, level, optname, optval, optlen, \
> max_optlen, retval); \
> diff --git a/include/linux/indirect_call_wrapper.h
> b/include/linux/indirect_call_wrapper.h
> index 54c02c84906a..9c3252f7e9bb 100644
> --- a/include/linux/indirect_call_wrapper.h
> +++ b/include/linux/indirect_call_wrapper.h
> @@ -54,10 +54,13 @@
> #if IS_BUILTIN(CONFIG_IPV6)
> #define INDIRECT_CALL_INET(f, f2, f1, ...) \
> INDIRECT_CALL_2(f, f2, f1, __VA_ARGS__)
> +#define INDIRECT_CALL_INET1(f, f1, ...) INDIRECT_CALL_1(f, f1, __VA_ARGS__)
> #elif IS_ENABLED(CONFIG_INET)
> #define INDIRECT_CALL_INET(f, f2, f1, ...) INDIRECT_CALL_1(f, f1, __VA_ARGS__)
> +#define INDIRECT_CALL_INET1(f, f1, ...) INDIRECT_CALL_1(f, f1, __VA_ARGS__)
> #else
> #define INDIRECT_CALL_INET(f, f2, f1, ...) f(__VA_ARGS__)
> +#define INDIRECT_CALL_INET1(f, f1, ...) INDIRECT_CALL_1(f, f1, __VA_ARGS__)
> #endif
>
> #endif
Yes, or maybe something only focusing on CONFIG_INET to make it clear.
diff --git a/include/linux/indirect_call_wrapper.h
b/include/linux/indirect_call_wrapper.h
index 54c02c84906ab2548a93bacb46f7795a8e136d83..d082aa4bd3ecae52e5998b3ac05deffafcb45de0
100644
--- a/include/linux/indirect_call_wrapper.h
+++ b/include/linux/indirect_call_wrapper.h
@@ -46,6 +46,12 @@
#define INDIRECT_CALLABLE_SCOPE static
#endif
+#elif IS_ENABLED(CONFIG_INET)
+#define INDIRECT_CALL_INET1(f, f2, f1, ...) INDIRECT_CALL_1(f, f1, __VA_ARGS__)
+#else
+#define INDIRECT_CALL_INET1(f, f2, f1, ...) f(__VA_ARGS__)
+#endif
+
/*
* We can use INDIRECT_CALL_$NR for ipv6 related functions only if ipv6 is
* builtin, this macro simplify dealing with indirect calls with only ipv4/ipv6
Powered by blists - more mailing lists