lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 25 May 2023 15:24:20 +0200
From: Eric Dumazet <edumazet@...gle.com>
To: Lorenz Bauer <lmb@...valent.com>
Cc: "David S. Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>, 
	Paolo Abeni <pabeni@...hat.com>, Alexei Starovoitov <ast@...nel.org>, 
	Daniel Borkmann <daniel@...earbox.net>, Andrii Nakryiko <andrii@...nel.org>, 
	Martin KaFai Lau <martin.lau@...ux.dev>, Song Liu <song@...nel.org>, Yonghong Song <yhs@...com>, 
	John Fastabend <john.fastabend@...il.com>, KP Singh <kpsingh@...nel.org>, 
	Stanislav Fomichev <sdf@...gle.com>, Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>, 
	David Ahern <dsahern@...nel.org>, Willem de Bruijn <willemdebruijn.kernel@...il.com>, 
	Joe Stringer <joe@...d.net.nz>, Joe Stringer <joe@...ium.io>, Martin KaFai Lau <kafai@...com>, netdev@...r.kernel.org, 
	linux-kernel@...r.kernel.org, bpf@...r.kernel.org
Subject: Re: [PATCH bpf-next 1/2] bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign

On Thu, May 25, 2023 at 10:19 AM Lorenz Bauer <lmb@...valent.com> wrote:
>
> Currently the bpf_sk_assign helper in tc BPF context refuses SO_REUSEPORT
> sockets. This means we can't use the helper to steer traffic to Envoy, which
> configures SO_REUSEPORT on its sockets. In turn, we're blocked from removing
> TPROXY from our setup.
>
> The reason that bpf_sk_assign refuses such sockets is that the bpf_sk_lookup
> helpers don't execute SK_REUSEPORT programs. Instead, one of the
> reuseport sockets is selected by hash. This could cause dispatch to the
> "wrong" socket:
>
>     sk = bpf_sk_lookup_tcp(...) // select SO_REUSEPORT by hash
>     bpf_sk_assign(skb, sk) // SK_REUSEPORT wasn't executed
>
> Fixing this isn't as simple as invoking SK_REUSEPORT from the lookup
> helpers unfortunately. In the tc context, L2 headers are at the start
> of the skb, while SK_REUSEPORT expects L3 headers instead.
>
> Instead, we execute the SK_REUSEPORT program when the assigned socket
> is pulled out of the skb, further up the stack. This creates some
> trickiness with regards to refcounting as bpf_sk_assign will put both
> refcounted and RCU freed sockets in skb->sk. reuseport sockets are RCU
> freed. We can infer that the sk_assigned socket is RCU freed if the
> reuseport lookup succeeds, but convincing yourself of this fact isn't
> straight forward. Therefore we defensively check refcounting on the
> sk_assign sock even though it's probably not required in practice.
>
> Fixes: 8e368dc ("bpf: Fix use of sk->sk_reuseport from sk_assign")
> Fixes: cf7fbe6 ("bpf: Add socket assign support")
> Co-developed-by: Daniel Borkmann <daniel@...earbox.net>
> Signed-off-by: Daniel Borkmann <daniel@...earbox.net>
> Signed-off-by: Lorenz Bauer <lmb@...valent.com>
> Cc: Joe Stringer <joe@...ium.io>
> Link: https://lore.kernel.org/bpf/CACAyw98+qycmpQzKupquhkxbvWK4OFyDuuLMBNROnfWMZxUWeA@mail.gmail.com/
> ---
>  include/net/inet6_hashtables.h | 36 +++++++++++++++++++++++++++++-----
>  include/net/inet_hashtables.h  | 27 +++++++++++++++++++++++--
>  include/net/sock.h             |  7 +++++--
>  include/uapi/linux/bpf.h       |  3 ---
>  net/core/filter.c              |  2 --
>  net/ipv4/inet_hashtables.c     | 15 +++++++-------
>  net/ipv4/udp.c                 | 23 +++++++++++++++++++---
>  net/ipv6/inet6_hashtables.c    | 19 +++++++++---------
>  net/ipv6/udp.c                 | 23 +++++++++++++++++++---
>  tools/include/uapi/linux/bpf.h |  3 ---
>  10 files changed, 119 insertions(+), 39 deletions(-)


> diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
> index e7391bf310a7..920131e4a65d 100644
> --- a/net/ipv4/inet_hashtables.c
> +++ b/net/ipv4/inet_hashtables.c
> @@ -332,10 +332,10 @@ static inline int compute_score(struct sock *sk, struct net *net,
>         return score;
>  }
>
> -static inline struct sock *lookup_reuseport(struct net *net, struct sock *sk,
> -                                           struct sk_buff *skb, int doff,
> -                                           __be32 saddr, __be16 sport,
> -                                           __be32 daddr, unsigned short hnum)
> +struct sock *inet_lookup_reuseport(struct net *net, struct sock *sk,
> +                                  struct sk_buff *skb, int doff,
> +                                  __be32 saddr, __be16 sport,
> +                                  __be32 daddr, unsigned short hnum)
>  {
>         struct sock *reuse_sk = NULL;
>         u32 phash;
> @@ -346,6 +346,7 @@ static inline struct sock *lookup_reuseport(struct net *net, struct sock *sk,
>         }
>         return reuse_sk;
>  }
> +EXPORT_SYMBOL_GPL(inet_lookup_reuseport);
>
>  /*
>   * Here are some nice properties to exploit here. The BSD API
> @@ -369,8 +370,8 @@ static struct sock *inet_lhash2_lookup(struct net *net,
>         sk_nulls_for_each_rcu(sk, node, &ilb2->nulls_head) {
>                 score = compute_score(sk, net, hnum, daddr, dif, sdif);
>                 if (score > hiscore) {
> -                       result = lookup_reuseport(net, sk, skb, doff,
> -                                                 saddr, sport, daddr, hnum);
> +                       result = inet_lookup_reuseport(net, sk, skb, doff,
> +                                                      saddr, sport, daddr, hnum);
>                         if (result)
>                                 return result;
>

Please split in a series.

First a patch renaming lookup_reuseport() to inet_lookup_reuseport()
and inet6_lookup_reuseport()
(cleanup, no change in behavior)

This would ease review and future bug hunting quite a bit.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ