lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <Ybom69OyOjsR7kmZ@google.com>
Date:   Wed, 15 Dec 2021 09:33:31 -0800
From:   sdf@...gle.com
To:     Pavel Begunkov <asml.silence@...il.com>
Cc:     netdev@...r.kernel.org, bpf@...r.kernel.org,
        Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Andrii Nakryiko <andrii@...nel.org>,
        Martin KaFai Lau <kafai@...com>,
        Song Liu <songliubraving@...com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3] cgroup/bpf: fast path skb BPF filtering

On 12/15, Pavel Begunkov wrote:
> On 12/15/21 16:51, sdf@...gle.com wrote:
> > On 12/15, Pavel Begunkov wrote:
> > > Add per socket fast path for not enabled BPF skb filtering, which  
> sheds
> > > a nice chunk of send/recv overhead when affected. Testing udp with 128
> > > byte payload and/or zerocopy with any payload size showed 2-3%
> > > improvement in requests/s on the tx side using fast NICs across  
> network,
> > > and around 4% for dummy device. Same goes for rx, not measured, but
> > > numbers should be relatable.
> > > In my understanding, this should affect a good share of machines, and  
> at
> > > least it includes my laptops and some checked servers.
> >
> > > The core of the problem is that even though there is
> > > cgroup_bpf_enabled_key guarding from __cgroup_bpf_run_filter_skb()
> > > overhead, there are cases where we have several cgroups and loading a
> > > BPF program to one also makes all others to go through the slow path
> > > even when they don't have any BPF attached. It's even worse, because
> > > apparently systemd or some other early init loads some BPF and so
> > > triggers exactly this situation for normal networking.
> >
> > > Signed-off-by: Pavel Begunkov <asml.silence@...il.com>
> > > ---
> >
> > > v2: replace bitmask appoach with empty_prog_array (suggested by  
> Martin)
> > > v3: add "bpf_" prefix to empty_prog_array (Martin)
> >
> > > � include/linux/bpf-cgroup.h | 24 +++++++++++++++++++++---
> > > � include/linux/bpf.h������� | 13 +++++++++++++
> > > � kernel/bpf/cgroup.c������� | 18 ++----------------
> > > � kernel/bpf/core.c��������� | 16 ++++------------
> > > � 4 files changed, 40 insertions(+), 31 deletions(-)
> >
> > > diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
> > > index 11820a430d6c..c6dacdbdf565 100644
> > > --- a/include/linux/bpf-cgroup.h
> > > +++ b/include/linux/bpf-cgroup.h
> > > @@ -219,11 +219,28 @@ int bpf_percpu_cgroup_storage_copy(struct  
> bpf_map *map, void *key, void *value);
> > > � int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key,
> > > ���������������������� void *value, u64 flags);
> >
> > > +static inline bool
> > > +__cgroup_bpf_prog_array_is_empty(struct cgroup_bpf *cgrp_bpf,
> > > +���������������� enum cgroup_bpf_attach_type type)
> > > +{
> > > +��� struct bpf_prog_array *array =  
> rcu_access_pointer(cgrp_bpf->effective[type]);
> > > +
> > > +��� return array == &bpf_empty_prog_array.hdr;
> > > +}
> > > +
> > > +#define CGROUP_BPF_TYPE_ENABLED(sk, atype)���������������������� \
> > > +({������������������������������������������ \
> > > +��� struct cgroup *__cgrp =  
> sock_cgroup_ptr(&(sk)->sk_cgrp_data);���������� \
> > > +������������������������������������������ \
> > > +��� !__cgroup_bpf_prog_array_is_empty(&__cgrp->bpf,  
> (atype));���������� \
> > > +})
> > > +
> > > � /* Wrappers for __cgroup_bpf_run_filter_skb() guarded by  
> cgroup_bpf_enabled. */
> > > � #define BPF_CGROUP_RUN_PROG_INET_INGRESS(sk, skb)����������������� \
> > > � ({����������������������������������������� \
> > > ����� int __ret = 0;��������������������������������� \
> > > -��� if (cgroup_bpf_enabled(CGROUP_INET_INGRESS))������������� \
> > > +��� if (cgroup_bpf_enabled(CGROUP_INET_INGRESS) && sk  
> &&������������� \
> > > +������� CGROUP_BPF_TYPE_ENABLED((sk),  
> CGROUP_INET_INGRESS))���������� \
> >
> > Why not add this __cgroup_bpf_run_filter_skb check to
> > __cgroup_bpf_run_filter_skb? Result of sock_cgroup_ptr() is already  
> there
> > and you can use it. Maybe move the things around if you want
> > it to happen earlier.

> For inlining. Just wanted to get it done right, otherwise I'll likely be
> returning to it back in a few months complaining that I see measurable
> overhead from the function call :)

Do you expect that direct call to bring any visible overhead?
Would be nice to compare that inlined case vs
__cgroup_bpf_prog_array_is_empty inside of __cgroup_bpf_run_filter_skb
while you're at it (plus move offset initialization down?).

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ