lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <99681548-fa79-0607-d574-db61818cab78@iogearbox.net> Date: Thu, 25 May 2023 22:01:08 +0200 From: Daniel Borkmann <daniel@...earbox.net> To: Kuniyuki Iwashima <kuniyu@...zon.com>, lmb@...valent.com Cc: andrii@...nel.org, ast@...nel.org, bpf@...r.kernel.org, davem@...emloft.net, dsahern@...nel.org, edumazet@...gle.com, haoluo@...gle.com, joe@...ium.io, joe@...d.net.nz, john.fastabend@...il.com, jolsa@...nel.org, kafai@...com, kpsingh@...nel.org, kuba@...nel.org, linux-kernel@...r.kernel.org, martin.lau@...ux.dev, netdev@...r.kernel.org, pabeni@...hat.com, sdf@...gle.com, song@...nel.org, willemdebruijn.kernel@...il.com, yhs@...com Subject: Re: [PATCH bpf-next 1/2] bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign On 5/25/23 7:41 PM, Kuniyuki Iwashima wrote: > From: Lorenz Bauer <lmb@...valent.com> > Date: Thu, 25 May 2023 09:19:22 +0100 >> Currently the bpf_sk_assign helper in tc BPF context refuses SO_REUSEPORT >> sockets. This means we can't use the helper to steer traffic to Envoy, which >> configures SO_REUSEPORT on its sockets. In turn, we're blocked from removing >> TPROXY from our setup. >> >> The reason that bpf_sk_assign refuses such sockets is that the bpf_sk_lookup >> helpers don't execute SK_REUSEPORT programs. Instead, one of the >> reuseport sockets is selected by hash. This could cause dispatch to the >> "wrong" socket: >> >> sk = bpf_sk_lookup_tcp(...) // select SO_REUSEPORT by hash >> bpf_sk_assign(skb, sk) // SK_REUSEPORT wasn't executed >> >> Fixing this isn't as simple as invoking SK_REUSEPORT from the lookup >> helpers unfortunately. In the tc context, L2 headers are at the start >> of the skb, while SK_REUSEPORT expects L3 headers instead. >> >> Instead, we execute the SK_REUSEPORT program when the assigned socket >> is pulled out of the skb, further up the stack. This creates some >> trickiness with regards to refcounting as bpf_sk_assign will put both >> refcounted and RCU freed sockets in skb->sk. reuseport sockets are RCU >> freed. We can infer that the sk_assigned socket is RCU freed if the >> reuseport lookup succeeds, but convincing yourself of this fact isn't >> straight forward. Therefore we defensively check refcounting on the >> sk_assign sock even though it's probably not required in practice. >> >> Fixes: 8e368dc ("bpf: Fix use of sk->sk_reuseport from sk_assign") >> Fixes: cf7fbe6 ("bpf: Add socket assign support") > > Please use 12 chars of hash. > > $ cat ~/.gitconfig > [core] > abbrev = 12 > [pretty] > fixes = Fixes: %h (\"%s\") > > $ git show 8e368dc --pretty=fixes | head -n 1 > Fixes: 8e368dc72e86 ("bpf: Fix use of sk->sk_reuseport from sk_assign") Yeap, not quite sure what happened here but the 12 chars is clear. Will be fixed up in v2, too, ofc. >> Co-developed-by: Daniel Borkmann <daniel@...earbox.net> >> Signed-off-by: Daniel Borkmann <daniel@...earbox.net> >> Signed-off-by: Lorenz Bauer <lmb@...valent.com> >> Cc: Joe Stringer <joe@...ium.io> >> Link: https://lore.kernel.org/bpf/CACAyw98+qycmpQzKupquhkxbvWK4OFyDuuLMBNROnfWMZxUWeA@mail.gmail.com/ >> --- >> include/net/inet6_hashtables.h | 36 +++++++++++++++++++++++++++++----- >> include/net/inet_hashtables.h | 27 +++++++++++++++++++++++-- >> include/net/sock.h | 7 +++++-- >> include/uapi/linux/bpf.h | 3 --- >> net/core/filter.c | 2 -- >> net/ipv4/inet_hashtables.c | 15 +++++++------- >> net/ipv4/udp.c | 23 +++++++++++++++++++--- >> net/ipv6/inet6_hashtables.c | 19 +++++++++--------- >> net/ipv6/udp.c | 23 +++++++++++++++++++--- >> tools/include/uapi/linux/bpf.h | 3 --- >> 10 files changed, 119 insertions(+), 39 deletions(-) >> [...] >> @@ -85,14 +92,33 @@ static inline struct sock *__inet6_lookup_skb(struct inet_hashinfo *hashinfo, >> int iif, int sdif, >> bool *refcounted) >> { >> - struct sock *sk = skb_steal_sock(skb, refcounted); >> - >> + bool prefetched; >> + struct sock *sk = skb_steal_sock(skb, refcounted, &prefetched); >> + struct net *net = dev_net(skb_dst(skb)->dev); >> + const struct ipv6hdr *ip6h = ipv6_hdr(skb); > > nit: Reverse Xmas Tree order. Same for other chunks. It is, the prefetched bool is simply used one line below. I don't think this is much different than most other code from style pov.. >> + >> + if (prefetched) { >> + struct sock *reuse_sk = inet6_lookup_reuseport(net, sk, skb, doff, >> + &ip6h->saddr, sport, >> + &ip6h->daddr, ntohs(dport)); >> + if (reuse_sk) { >> + if (reuse_sk != sk) { >> + if (*refcounted) { >> + sock_put(sk); >> + *refcounted = false; >> + } >> + if (IS_ERR(reuse_sk)) >> + return NULL; >> + } >> + return reuse_sk; >> + } > > Maybe we can add a hepler to avoid this duplication ? We'll check if it can be made a bit nicer and integrate this into the v2. Thanks, Daniel
Powered by blists - more mailing lists