lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <a71b8941-1ffc-37fc-6676-d3b4cf44f149@iogearbox.net> Date: Thu, 25 May 2023 21:51:14 +0200 From: Daniel Borkmann <daniel@...earbox.net> To: Eric Dumazet <edumazet@...gle.com>, Lorenz Bauer <lmb@...valent.com> Cc: "David S. Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, Alexei Starovoitov <ast@...nel.org>, Andrii Nakryiko <andrii@...nel.org>, Martin KaFai Lau <martin.lau@...ux.dev>, Song Liu <song@...nel.org>, Yonghong Song <yhs@...com>, John Fastabend <john.fastabend@...il.com>, KP Singh <kpsingh@...nel.org>, Stanislav Fomichev <sdf@...gle.com>, Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>, David Ahern <dsahern@...nel.org>, Willem de Bruijn <willemdebruijn.kernel@...il.com>, Joe Stringer <joe@...d.net.nz>, Joe Stringer <joe@...ium.io>, Martin KaFai Lau <kafai@...com>, netdev@...r.kernel.org, linux-kernel@...r.kernel.org, bpf@...r.kernel.org Subject: Re: [PATCH bpf-next 1/2] bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign On 5/25/23 3:24 PM, Eric Dumazet wrote: > On Thu, May 25, 2023 at 10:19 AM Lorenz Bauer <lmb@...valent.com> wrote: >> >> Currently the bpf_sk_assign helper in tc BPF context refuses SO_REUSEPORT >> sockets. This means we can't use the helper to steer traffic to Envoy, which >> configures SO_REUSEPORT on its sockets. In turn, we're blocked from removing >> TPROXY from our setup. >> >> The reason that bpf_sk_assign refuses such sockets is that the bpf_sk_lookup >> helpers don't execute SK_REUSEPORT programs. Instead, one of the >> reuseport sockets is selected by hash. This could cause dispatch to the >> "wrong" socket: >> >> sk = bpf_sk_lookup_tcp(...) // select SO_REUSEPORT by hash >> bpf_sk_assign(skb, sk) // SK_REUSEPORT wasn't executed >> >> Fixing this isn't as simple as invoking SK_REUSEPORT from the lookup >> helpers unfortunately. In the tc context, L2 headers are at the start >> of the skb, while SK_REUSEPORT expects L3 headers instead. >> >> Instead, we execute the SK_REUSEPORT program when the assigned socket >> is pulled out of the skb, further up the stack. This creates some >> trickiness with regards to refcounting as bpf_sk_assign will put both >> refcounted and RCU freed sockets in skb->sk. reuseport sockets are RCU >> freed. We can infer that the sk_assigned socket is RCU freed if the >> reuseport lookup succeeds, but convincing yourself of this fact isn't >> straight forward. Therefore we defensively check refcounting on the >> sk_assign sock even though it's probably not required in practice. >> >> Fixes: 8e368dc ("bpf: Fix use of sk->sk_reuseport from sk_assign") >> Fixes: cf7fbe6 ("bpf: Add socket assign support") >> Co-developed-by: Daniel Borkmann <daniel@...earbox.net> >> Signed-off-by: Daniel Borkmann <daniel@...earbox.net> >> Signed-off-by: Lorenz Bauer <lmb@...valent.com> >> Cc: Joe Stringer <joe@...ium.io> >> Link: https://lore.kernel.org/bpf/CACAyw98+qycmpQzKupquhkxbvWK4OFyDuuLMBNROnfWMZxUWeA@mail.gmail.com/ >> --- >> include/net/inet6_hashtables.h | 36 +++++++++++++++++++++++++++++----- >> include/net/inet_hashtables.h | 27 +++++++++++++++++++++++-- >> include/net/sock.h | 7 +++++-- >> include/uapi/linux/bpf.h | 3 --- >> net/core/filter.c | 2 -- >> net/ipv4/inet_hashtables.c | 15 +++++++------- >> net/ipv4/udp.c | 23 +++++++++++++++++++--- >> net/ipv6/inet6_hashtables.c | 19 +++++++++--------- >> net/ipv6/udp.c | 23 +++++++++++++++++++--- >> tools/include/uapi/linux/bpf.h | 3 --- >> 10 files changed, 119 insertions(+), 39 deletions(-) > > >> diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c >> index e7391bf310a7..920131e4a65d 100644 >> --- a/net/ipv4/inet_hashtables.c >> +++ b/net/ipv4/inet_hashtables.c >> @@ -332,10 +332,10 @@ static inline int compute_score(struct sock *sk, struct net *net, >> return score; >> } >> >> -static inline struct sock *lookup_reuseport(struct net *net, struct sock *sk, >> - struct sk_buff *skb, int doff, >> - __be32 saddr, __be16 sport, >> - __be32 daddr, unsigned short hnum) >> +struct sock *inet_lookup_reuseport(struct net *net, struct sock *sk, >> + struct sk_buff *skb, int doff, >> + __be32 saddr, __be16 sport, >> + __be32 daddr, unsigned short hnum) >> { >> struct sock *reuse_sk = NULL; >> u32 phash; >> @@ -346,6 +346,7 @@ static inline struct sock *lookup_reuseport(struct net *net, struct sock *sk, >> } >> return reuse_sk; >> } >> +EXPORT_SYMBOL_GPL(inet_lookup_reuseport); >> >> /* >> * Here are some nice properties to exploit here. The BSD API >> @@ -369,8 +370,8 @@ static struct sock *inet_lhash2_lookup(struct net *net, >> sk_nulls_for_each_rcu(sk, node, &ilb2->nulls_head) { >> score = compute_score(sk, net, hnum, daddr, dif, sdif); >> if (score > hiscore) { >> - result = lookup_reuseport(net, sk, skb, doff, >> - saddr, sport, daddr, hnum); >> + result = inet_lookup_reuseport(net, sk, skb, doff, >> + saddr, sport, daddr, hnum); >> if (result) >> return result; >> > > Please split in a series. > > First a patch renaming lookup_reuseport() to inet_lookup_reuseport() > and inet6_lookup_reuseport() > (cleanup, no change in behavior) > > This would ease review and future bug hunting quite a bit. Makes sense and should reduce the churn on the actual change. I think Lorenz is planning to flush out a v2 next week with this split. Thanks, Daniel
Powered by blists - more mailing lists