netdev - Re: [PATCH v2 bpf-next 02/15] bpf: net: Avoid sk_setsockopt() taking sk lock when called from bpf

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEf4Bzb9js_4UFChVWOjw52ik5TmNJroF5bXSicJtxyNZH8k3A@mail.gmail.com>
Date:   Thu, 4 Aug 2022 12:03:04 -0700
From:   Andrii Nakryiko <andrii.nakryiko@...il.com>
To:     Martin KaFai Lau <kafai@...com>
Cc:     bpf@...r.kernel.org, netdev@...r.kernel.org,
        Alexei Starovoitov <ast@...nel.org>,
        Andrii Nakryiko <andrii@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        David Miller <davem@...emloft.net>,
        Eric Dumazet <edumazet@...gle.com>,
        Jakub Kicinski <kuba@...nel.org>, kernel-team@...com,
        Paolo Abeni <pabeni@...hat.com>,
        Stanislav Fomichev <sdf@...gle.com>
Subject: Re: [PATCH v2 bpf-next 02/15] bpf: net: Avoid sk_setsockopt() taking
 sk lock when called from bpf

On Wed, Aug 3, 2022 at 1:49 PM Martin KaFai Lau <kafai@...com> wrote:
>
> Most of the code in bpf_setsockopt(SOL_SOCKET) are duplicated from
> the sk_setsockopt().  The number of supported optnames are
> increasing ever and so as the duplicated code.
>
> One issue in reusing sk_setsockopt() is that the bpf prog
> has already acquired the sk lock.  This patch adds a in_bpf()
> to tell if the sk_setsockopt() is called from a bpf prog.
> The bpf prog calling bpf_setsockopt() is either running in_task()
> or in_serving_softirq().  Both cases have the current->bpf_ctx
> initialized.  Thus, the in_bpf() only needs to test !!current->bpf_ctx.
>
> This patch also adds sockopt_{lock,release}_sock() helpers
> for sk_setsockopt() to use.  These helpers will test in_bpf()
> before acquiring/releasing the lock.  They are in EXPORT_SYMBOL
> for the ipv6 module to use in a latter patch.
>
> Note on the change in sock_setbindtodevice().  sockopt_lock_sock()
> is done in sock_setbindtodevice() instead of doing the lock_sock
> in sock_bindtoindex(..., lock_sk = true).
>
> Signed-off-by: Martin KaFai Lau <kafai@...com>
> ---
>  include/linux/bpf.h |  8 ++++++++
>  include/net/sock.h  |  3 +++
>  net/core/sock.c     | 26 +++++++++++++++++++++++---
>  3 files changed, 34 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 20c26aed7896..b905b1b34fe4 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1966,6 +1966,10 @@ static inline bool unprivileged_ebpf_enabled(void)
>         return !sysctl_unprivileged_bpf_disabled;
>  }
>
> +static inline bool in_bpf(void)

I think this function deserves a big comment explaining that it's not
100% accurate, as not every BPF program type sets bpf_ctx. As it is
named in_bpf() promises a lot more generality than it actually
provides.

Should this be named either more specific has_current_bpf_ctx() maybe?

Also, separately, should be make an effort to set bpf_ctx for all
program types (instead or in addition to the above)?

> +{
> +       return !!current->bpf_ctx;
> +}
>  #else /* !CONFIG_BPF_SYSCALL */
>  static inline struct bpf_prog *bpf_prog_get(u32 ufd)
>  {
> @@ -2175,6 +2179,10 @@ static inline bool unprivileged_ebpf_enabled(void)
>         return false;
>  }
>
> +static inline bool in_bpf(void)
> +{
> +       return false;
> +}
>  #endif /* CONFIG_BPF_SYSCALL */
>
>  void __bpf_free_used_btfs(struct bpf_prog_aux *aux,
> diff --git a/include/net/sock.h b/include/net/sock.h
> index a7273b289188..b2ff230860c6 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1721,6 +1721,9 @@ static inline void unlock_sock_fast(struct sock *sk, bool slow)
>         }
>  }
>
> +void sockopt_lock_sock(struct sock *sk);
> +void sockopt_release_sock(struct sock *sk);
> +
>  /* Used by processes to "lock" a socket state, so that
>   * interrupts and bottom half handlers won't change it
>   * from under us. It essentially blocks any incoming
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 20269c37ab3b..82759540ae2c 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -703,7 +703,9 @@ static int sock_setbindtodevice(struct sock *sk, sockptr_t optval, int optlen)
>                         goto out;
>         }
>
> -       return sock_bindtoindex(sk, index, true);
> +       sockopt_lock_sock(sk);
> +       ret = sock_bindtoindex_locked(sk, index);
> +       sockopt_release_sock(sk);
>  out:
>  #endif
>
> @@ -1036,6 +1038,24 @@ static int sock_reserve_memory(struct sock *sk, int bytes)
>         return 0;
>  }
>
> +void sockopt_lock_sock(struct sock *sk)
> +{
> +       if (in_bpf())
> +               return;
> +
> +       lock_sock(sk);
> +}
> +EXPORT_SYMBOL(sockopt_lock_sock);
> +
> +void sockopt_release_sock(struct sock *sk)
> +{
> +       if (in_bpf())
> +               return;
> +
> +       release_sock(sk);
> +}
> +EXPORT_SYMBOL(sockopt_release_sock);
> +
>  /*
>   *     This is meant for all protocols to use and covers goings on
>   *     at the socket level. Everything here is generic.
> @@ -1067,7 +1087,7 @@ static int sk_setsockopt(struct sock *sk, int level, int optname,
>
>         valbool = val ? 1 : 0;
>
> -       lock_sock(sk);
> +       sockopt_lock_sock(sk);
>
>         switch (optname) {
>         case SO_DEBUG:
> @@ -1496,7 +1516,7 @@ static int sk_setsockopt(struct sock *sk, int level, int optname,
>                 ret = -ENOPROTOOPT;
>                 break;
>         }
> -       release_sock(sk);
> +       sockopt_release_sock(sk);
>         return ret;
>  }
>
> --
> 2.30.2
>