netdev - Re: [PATCH bpf-next v5 1/3] bpf: remove extra lock_sock for TCP_ZEROCOPY

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKH8qBsuea-HbfDD4AB0sMgT2Gn-0P3xw0CdaTcoytRAGLa4zg@mail.gmail.com>
Date:   Fri, 8 Jan 2021 11:26:47 -0800
From:   Stanislav Fomichev <sdf@...gle.com>
To:     Eric Dumazet <edumazet@...gle.com>
Cc:     netdev <netdev@...r.kernel.org>, bpf <bpf@...r.kernel.org>,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Martin KaFai Lau <kafai@...com>,
        Song Liu <songliubraving@...com>,
        Brian Vazquez <brianvv@...gle.com>
Subject: Re: [PATCH bpf-next v5 1/3] bpf: remove extra lock_sock for TCP_ZEROCOPY_RECEIVE

On Fri, Jan 8, 2021 at 11:23 AM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Fri, Jan 8, 2021 at 8:08 PM Stanislav Fomichev <sdf@...gle.com> wrote:
> >
> > On Fri, Jan 8, 2021 at 10:41 AM Eric Dumazet <edumazet@...gle.com> wrote:
> > >
> > > On Fri, Jan 8, 2021 at 7:26 PM Stanislav Fomichev <sdf@...gle.com> wrote:
> > > >
> > > > On Fri, Jan 8, 2021 at 10:10 AM Eric Dumazet <edumazet@...gle.com> wrote:
> > > > >
> > > > > On Fri, Jan 8, 2021 at 7:03 PM Stanislav Fomichev <sdf@...gle.com> wrote:
> > > > > >
> > > > > > Add custom implementation of getsockopt hook for TCP_ZEROCOPY_RECEIVE.
> > > > > > We skip generic hooks for TCP_ZEROCOPY_RECEIVE and have a custom
> > > > > > call in do_tcp_getsockopt using the on-stack data. This removes
> > > > > > 3% overhead for locking/unlocking the socket.
> > > > > >
> > > > > > Without this patch:
> > > > > >      3.38%     0.07%  tcp_mmap  [kernel.kallsyms]  [k] __cgroup_bpf_run_filter_getsockopt
> > > > > >             |
> > > > > >              --3.30%--__cgroup_bpf_run_filter_getsockopt
> > > > > >                        |
> > > > > >                         --0.81%--__kmalloc
> > > > > >
> > > > > > With the patch applied:
> > > > > >      0.52%     0.12%  tcp_mmap  [kernel.kallsyms]  [k] __cgroup_bpf_run_filter_getsockopt_kern
> > > > > >
> > > > >
> > > > >
> > > > > OK but we are adding yet another indirect call.
> > > > >
> > > > > Can you add a patch on top of it adding INDIRECT_CALL_INET() avoidance ?
> > > > Sure, but do you think it will bring any benefit?
> > >
> > > Sure, avoiding an indirect call might be the same gain than the
> > > lock_sock() avoidance :)
> > >
> > > > We don't have any indirect avoidance in __sys_getsockopt for the
> > > > sock->ops->getsockopt() call.
> > > > If we add it for this new bpf_bypass_getsockopt, we might as well add
> > > > it for sock->ops->getsockopt?
> > >
> > > Well, that is orthogonal to this patch.
> > > As you may know, Google kernels do have a mitigation there already and
> > > Brian may upstream it.
> > I guess my point here was that if I send it out only for bpf_bypass_getsockopt
> > it might look a bit strange because the rest of the getsockopt still
> > suffers the indirect costs.
>
>
> Each new indirect call adds a cost. If you focus on optimizing
> TCP_ZEROCOPY_RECEIVE,
> it is counter intuitive adding an expensive indirect call.
Ok, then let me resend with a mitigation in place and a note
that the rest will be added later.

>  If Brian has plans to upstream the rest, maybe
> > it's better to upstream everything together with some numbers?
> > CC'ing him for his opinion.
>
> I am just saying your point about the other indirect call is already taken care.
>
> >
> > I'm happy to follow up in whatever form is best. I can also resend
> > with INDIRECT_CALL_INET2 if there are no objections in including
> > this version from the start.
> >
>
> INDIRECT_CALL_INET2 seems a strange name to me.
Any suggestion for a better name? I did play with the following:
diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index cbba9c9ab073..f7342a30284c 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -371,7 +371,9 @@ int bpf_percpu_cgroup_storage_update(struct
bpf_map *map, void *key,
        int __ret = retval;                                                    \
        if (cgroup_bpf_enabled(BPF_CGROUP_GETSOCKOPT))                         \
                if (!(sock)->sk_prot->bpf_bypass_getsockopt ||                 \
-                   !(sock)->sk_prot->bpf_bypass_getsockopt(level, optname))   \
+
!INDIRECT_CALL_INET1((sock)->sk_prot->bpf_bypass_getsockopt, \
+                                       tcp_bpf_bypass_getsockopt,             \
+                                       level, optname))                       \
                        __ret = __cgroup_bpf_run_filter_getsockopt(            \
                                sock, level, optname, optval, optlen,          \
                                max_optlen, retval);                           \
diff --git a/include/linux/indirect_call_wrapper.h
b/include/linux/indirect_call_wrapper.h
index 54c02c84906a..9c3252f7e9bb 100644
--- a/include/linux/indirect_call_wrapper.h
+++ b/include/linux/indirect_call_wrapper.h
@@ -54,10 +54,13 @@
 #if IS_BUILTIN(CONFIG_IPV6)
 #define INDIRECT_CALL_INET(f, f2, f1, ...) \
        INDIRECT_CALL_2(f, f2, f1, __VA_ARGS__)
+#define INDIRECT_CALL_INET1(f, f1, ...) INDIRECT_CALL_1(f, f1, __VA_ARGS__)
 #elif IS_ENABLED(CONFIG_INET)
 #define INDIRECT_CALL_INET(f, f2, f1, ...) INDIRECT_CALL_1(f, f1, __VA_ARGS__)
+#define INDIRECT_CALL_INET1(f, f1, ...) INDIRECT_CALL_1(f, f1, __VA_ARGS__)
 #else
 #define INDIRECT_CALL_INET(f, f2, f1, ...) f(__VA_ARGS__)
+#define INDIRECT_CALL_INET1(f, f1, ...) f(__VA_ARGS__)
 #endif

 #endif