netdev - Re: [PATCH bpf-next v5 1/3] bpf: remove extra lock_sock for TCP_ZEROCOPY

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CANn89iJvGjtVD02MZPKwCkZ5raBDG4_SpejBGtdvSJ63XrqPww@mail.gmail.com>
Date:   Fri, 8 Jan 2021 21:08:03 +0100
From:   Eric Dumazet <edumazet@...gle.com>
To:     Stanislav Fomichev <sdf@...gle.com>
Cc:     netdev <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Martin KaFai Lau <kafai@...com>,
        Song Liu <songliubraving@...com>,
        Brian Vazquez <brianvv@...gle.com>
Subject: Re: [PATCH bpf-next v5 1/3] bpf: remove extra lock_sock for TCP_ZEROCOPY_RECEIVE

On Fri, Jan 8, 2021 at 8:27 PM Stanislav Fomichev <sdf@...gle.com> wrote:
>
> On Fri, Jan 8, 2021 at 11:23 AM Eric Dumazet <edumazet@...gle.com> wrote:
> >
> > On Fri, Jan 8, 2021 at 8:08 PM Stanislav Fomichev <sdf@...gle.com> wrote:
> > >
> > > On Fri, Jan 8, 2021 at 10:41 AM Eric Dumazet <edumazet@...gle.com> wrote:
> > > >
> > > > On Fri, Jan 8, 2021 at 7:26 PM Stanislav Fomichev <sdf@...gle.com> wrote:
> > > > >
> > > > > On Fri, Jan 8, 2021 at 10:10 AM Eric Dumazet <edumazet@...gle.com> wrote:
> > > > > >
> > > > > > On Fri, Jan 8, 2021 at 7:03 PM Stanislav Fomichev <sdf@...gle.com> wrote:
> > > > > > >
> > > > > > > Add custom implementation of getsockopt hook for TCP_ZEROCOPY_RECEIVE.
> > > > > > > We skip generic hooks for TCP_ZEROCOPY_RECEIVE and have a custom
> > > > > > > call in do_tcp_getsockopt using the on-stack data. This removes
> > > > > > > 3% overhead for locking/unlocking the socket.
> > > > > > >
> > > > > > > Without this patch:
> > > > > > >      3.38%     0.07%  tcp_mmap  [kernel.kallsyms]  [k] __cgroup_bpf_run_filter_getsockopt
> > > > > > >             |
> > > > > > >              --3.30%--__cgroup_bpf_run_filter_getsockopt
> > > > > > >                        |
> > > > > > >                         --0.81%--__kmalloc
> > > > > > >
> > > > > > > With the patch applied:
> > > > > > >      0.52%     0.12%  tcp_mmap  [kernel.kallsyms]  [k] __cgroup_bpf_run_filter_getsockopt_kern
> > > > > > >
> > > > > >
> > > > > >
> > > > > > OK but we are adding yet another indirect call.
> > > > > >
> > > > > > Can you add a patch on top of it adding INDIRECT_CALL_INET() avoidance ?
> > > > > Sure, but do you think it will bring any benefit?
> > > >
> > > > Sure, avoiding an indirect call might be the same gain than the
> > > > lock_sock() avoidance :)
> > > >
> > > > > We don't have any indirect avoidance in __sys_getsockopt for the
> > > > > sock->ops->getsockopt() call.
> > > > > If we add it for this new bpf_bypass_getsockopt, we might as well add
> > > > > it for sock->ops->getsockopt?
> > > >
> > > > Well, that is orthogonal to this patch.
> > > > As you may know, Google kernels do have a mitigation there already and
> > > > Brian may upstream it.
> > > I guess my point here was that if I send it out only for bpf_bypass_getsockopt
> > > it might look a bit strange because the rest of the getsockopt still
> > > suffers the indirect costs.
> >
> >
> > Each new indirect call adds a cost. If you focus on optimizing
> > TCP_ZEROCOPY_RECEIVE,
> > it is counter intuitive adding an expensive indirect call.
> Ok, then let me resend with a mitigation in place and a note
> that the rest will be added later.
>
> >  If Brian has plans to upstream the rest, maybe
> > > it's better to upstream everything together with some numbers?
> > > CC'ing him for his opinion.
> >
> > I am just saying your point about the other indirect call is already taken care.
> >
> > >
> > > I'm happy to follow up in whatever form is best. I can also resend
> > > with INDIRECT_CALL_INET2 if there are no objections in including
> > > this version from the start.
> > >
> >
> > INDIRECT_CALL_INET2 seems a strange name to me.
> Any suggestion for a better name? I did play with the following:
> diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
> index cbba9c9ab073..f7342a30284c 100644
> --- a/include/linux/bpf-cgroup.h
> +++ b/include/linux/bpf-cgroup.h
> @@ -371,7 +371,9 @@ int bpf_percpu_cgroup_storage_update(struct
> bpf_map *map, void *key,
>         int __ret = retval;                                                    \
>         if (cgroup_bpf_enabled(BPF_CGROUP_GETSOCKOPT))                         \
>                 if (!(sock)->sk_prot->bpf_bypass_getsockopt ||                 \
> -                   !(sock)->sk_prot->bpf_bypass_getsockopt(level, optname))   \
> +
> !INDIRECT_CALL_INET1((sock)->sk_prot->bpf_bypass_getsockopt, \
> +                                       tcp_bpf_bypass_getsockopt,             \
> +                                       level, optname))                       \
>                         __ret = __cgroup_bpf_run_filter_getsockopt(            \
>                                 sock, level, optname, optval, optlen,          \
>                                 max_optlen, retval);                           \
> diff --git a/include/linux/indirect_call_wrapper.h
> b/include/linux/indirect_call_wrapper.h
> index 54c02c84906a..9c3252f7e9bb 100644
> --- a/include/linux/indirect_call_wrapper.h
> +++ b/include/linux/indirect_call_wrapper.h
> @@ -54,10 +54,13 @@
>  #if IS_BUILTIN(CONFIG_IPV6)
>  #define INDIRECT_CALL_INET(f, f2, f1, ...) \
>         INDIRECT_CALL_2(f, f2, f1, __VA_ARGS__)
> +#define INDIRECT_CALL_INET1(f, f1, ...) INDIRECT_CALL_1(f, f1, __VA_ARGS__)
>  #elif IS_ENABLED(CONFIG_INET)
>  #define INDIRECT_CALL_INET(f, f2, f1, ...) INDIRECT_CALL_1(f, f1, __VA_ARGS__)
> +#define INDIRECT_CALL_INET1(f, f1, ...) INDIRECT_CALL_1(f, f1, __VA_ARGS__)
>  #else
>  #define INDIRECT_CALL_INET(f, f2, f1, ...) f(__VA_ARGS__)
> +#define INDIRECT_CALL_INET1(f, f1, ...) INDIRECT_CALL_1(f, f1, __VA_ARGS__)
>  #endif
>
>  #endif

Yes, or maybe something only focusing on CONFIG_INET to make it clear.

diff --git a/include/linux/indirect_call_wrapper.h
b/include/linux/indirect_call_wrapper.h
index 54c02c84906ab2548a93bacb46f7795a8e136d83..d082aa4bd3ecae52e5998b3ac05deffafcb45de0
100644
--- a/include/linux/indirect_call_wrapper.h
+++ b/include/linux/indirect_call_wrapper.h
@@ -46,6 +46,12 @@
 #define INDIRECT_CALLABLE_SCOPE                static
 #endif

+#elif IS_ENABLED(CONFIG_INET)
+#define INDIRECT_CALL_INET1(f, f2, f1, ...) INDIRECT_CALL_1(f, f1, __VA_ARGS__)
+#else
+#define INDIRECT_CALL_INET1(f, f2, f1, ...) f(__VA_ARGS__)
+#endif
+
 /*
  * We can use INDIRECT_CALL_$NR for ipv6 related functions only if ipv6 is
  * builtin, this macro simplify dealing with indirect calls with only ipv4/ipv6