[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20220805002918.l23wcmezgtmpcjxf@kafai-mbp.dhcp.thefacebook.com>
Date: Thu, 4 Aug 2022 17:29:18 -0700
From: Martin KaFai Lau <kafai@...com>
To: Andrii Nakryiko <andrii.nakryiko@...il.com>
Cc: bpf <bpf@...r.kernel.org>, Networking <netdev@...r.kernel.org>,
Alexei Starovoitov <ast@...nel.org>,
Andrii Nakryiko <andrii@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
David Miller <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>,
Kernel Team <kernel-team@...com>,
Paolo Abeni <pabeni@...hat.com>,
Stanislav Fomichev <sdf@...gle.com>
Subject: Re: Universally available bpf_ctx WAS: Re: [PATCH v2 bpf-next 02/15]
bpf: net: Avoid sk_setsockopt() taking sk lock when called from bpf
On Thu, Aug 04, 2022 at 01:51:12PM -0700, Andrii Nakryiko wrote:
> On Thu, Aug 4, 2022 at 12:29 PM Martin KaFai Lau <kafai@...com> wrote:
> >
> > On Thu, Aug 04, 2022 at 12:03:04PM -0700, Andrii Nakryiko wrote:
> > > On Wed, Aug 3, 2022 at 1:49 PM Martin KaFai Lau <kafai@...com> wrote:
> > > >
> > > > Most of the code in bpf_setsockopt(SOL_SOCKET) are duplicated from
> > > > the sk_setsockopt(). The number of supported optnames are
> > > > increasing ever and so as the duplicated code.
> > > >
> > > > One issue in reusing sk_setsockopt() is that the bpf prog
> > > > has already acquired the sk lock. This patch adds a in_bpf()
> > > > to tell if the sk_setsockopt() is called from a bpf prog.
> > > > The bpf prog calling bpf_setsockopt() is either running in_task()
> > > > or in_serving_softirq(). Both cases have the current->bpf_ctx
> > > > initialized. Thus, the in_bpf() only needs to test !!current->bpf_ctx.
> > > >
> > > > This patch also adds sockopt_{lock,release}_sock() helpers
> > > > for sk_setsockopt() to use. These helpers will test in_bpf()
> > > > before acquiring/releasing the lock. They are in EXPORT_SYMBOL
> > > > for the ipv6 module to use in a latter patch.
> > > >
> > > > Note on the change in sock_setbindtodevice(). sockopt_lock_sock()
> > > > is done in sock_setbindtodevice() instead of doing the lock_sock
> > > > in sock_bindtoindex(..., lock_sk = true).
> > > >
> > > > Signed-off-by: Martin KaFai Lau <kafai@...com>
> > > > ---
> > > > include/linux/bpf.h | 8 ++++++++
> > > > include/net/sock.h | 3 +++
> > > > net/core/sock.c | 26 +++++++++++++++++++++++---
> > > > 3 files changed, 34 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > index 20c26aed7896..b905b1b34fe4 100644
> > > > --- a/include/linux/bpf.h
> > > > +++ b/include/linux/bpf.h
> > > > @@ -1966,6 +1966,10 @@ static inline bool unprivileged_ebpf_enabled(void)
> > > > return !sysctl_unprivileged_bpf_disabled;
> > > > }
> > > >
> > > > +static inline bool in_bpf(void)
> > >
> > > I think this function deserves a big comment explaining that it's not
> > > 100% accurate, as not every BPF program type sets bpf_ctx. As it is
> > > named in_bpf() promises a lot more generality than it actually
> > > provides.
> > >
> > > Should this be named either more specific has_current_bpf_ctx() maybe?
> > Stans also made a similar point on this to add comment.
> > Rename makes sense until all bpf prog has bpf_ctx. in_bpf() was
> > just the name it was used in the v1 discussion for the setsockopt
> > context.
> >
> > > Also, separately, should be make an effort to set bpf_ctx for all
> > > program types (instead or in addition to the above)?
> > I would prefer to separate this as a separate effort. This set is
> > getting pretty long and the bpf_getsockopt() is still not posted.
>
> Yeah, sure, I don't think you should be blocked on that.
>
> >
> > If you prefer this must be done first, I can do that also.
>
> I wanted to bring this up for discussion. I find bpf_ctx a very useful
> construct, if we had it available universally we could use it
> (reliably) for this in_bpf() check, we could also have a sleepable vs
> non-sleepable flag stored in such context and thus avoid all the
> special handling we have for providing different gfp flags, etc.
>
> But it's not just up for me to decide if we want to add it for all
> program types (e.g., I wouldn't be surprised if I got push back adding
> this to XDP). Most program types I normally use already have bpf_ctx
> (and bpf_cookie built on top), but I was wondering what others feel
> regarding making this (bpf_ctx in general, bpf_cookie in particular)
> universally available.
It may be easier to reason to add bpf_ctx with a use case.
Like networking prog, the cgroup-bpf (for storage and retval) and the
struct_ops (came automatically from the trampoline but finally become
useful in setsockopt here).
I don't think other network prog types have it now. For sleepable or not,
I am not sure if those other network programs will ever be sleepable.
tc-bpf cannot call setsockopt also.
>
> So please proceed with your changes, I just used your patch as an
> anchor for this discussion :)
+1
Powered by blists - more mailing lists