lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <CADa=RywoZZ9cAVPqa88mRNc2g1gQF743oEiSw2vnVHEFrN956g@mail.gmail.com> Date: Thu, 25 May 2023 22:56:50 -0700 From: Joe Stringer <joe@...ium.io> To: Lorenz Bauer <lmb@...valent.com> Cc: "David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, Alexei Starovoitov <ast@...nel.org>, Daniel Borkmann <daniel@...earbox.net>, Andrii Nakryiko <andrii@...nel.org>, Martin KaFai Lau <martin.lau@...ux.dev>, Song Liu <song@...nel.org>, Yonghong Song <yhs@...com>, John Fastabend <john.fastabend@...il.com>, KP Singh <kpsingh@...nel.org>, Stanislav Fomichev <sdf@...gle.com>, Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>, David Ahern <dsahern@...nel.org>, Willem de Bruijn <willemdebruijn.kernel@...il.com>, Joe Stringer <joe@...d.net.nz>, Joe Stringer <joe@...ium.io>, Martin KaFai Lau <kafai@...com>, netdev@...r.kernel.org, linux-kernel@...r.kernel.org, bpf@...r.kernel.org Subject: Re: [PATCH bpf-next 1/2] bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign On Thu, May 25, 2023 at 1:19 AM Lorenz Bauer <lmb@...valent.com> wrote: > > Currently the bpf_sk_assign helper in tc BPF context refuses SO_REUSEPORT > sockets. This means we can't use the helper to steer traffic to Envoy, which > configures SO_REUSEPORT on its sockets. In turn, we're blocked from removing > TPROXY from our setup. > > The reason that bpf_sk_assign refuses such sockets is that the bpf_sk_lookup > helpers don't execute SK_REUSEPORT programs. Instead, one of the > reuseport sockets is selected by hash. This could cause dispatch to the > "wrong" socket: > > sk = bpf_sk_lookup_tcp(...) // select SO_REUSEPORT by hash > bpf_sk_assign(skb, sk) // SK_REUSEPORT wasn't executed > > Fixing this isn't as simple as invoking SK_REUSEPORT from the lookup > helpers unfortunately. In the tc context, L2 headers are at the start > of the skb, while SK_REUSEPORT expects L3 headers instead. > > Instead, we execute the SK_REUSEPORT program when the assigned socket > is pulled out of the skb, further up the stack. This creates some > trickiness with regards to refcounting as bpf_sk_assign will put both > refcounted and RCU freed sockets in skb->sk. reuseport sockets are RCU > freed. We can infer that the sk_assigned socket is RCU freed if the > reuseport lookup succeeds, but convincing yourself of this fact isn't > straight forward. Therefore we defensively check refcounting on the > sk_assign sock even though it's probably not required in practice. > > Fixes: 8e368dc ("bpf: Fix use of sk->sk_reuseport from sk_assign") > Fixes: cf7fbe6 ("bpf: Add socket assign support") > Co-developed-by: Daniel Borkmann <daniel@...earbox.net> > Signed-off-by: Daniel Borkmann <daniel@...earbox.net> > Signed-off-by: Lorenz Bauer <lmb@...valent.com> > Cc: Joe Stringer <joe@...ium.io> > Link: https://lore.kernel.org/bpf/CACAyw98+qycmpQzKupquhkxbvWK4OFyDuuLMBNROnfWMZxUWeA@mail.gmail.com/ Nice approach to fix this issue, wish I'd thought of it :) I pulled this and tested out in a little-vm-helper environment with kind and Cilium's examples/kubernetes/connectivity-check proxy suite, as well as cilium-cli's connectivity tests and the L7 features seem to be working as expected with SO_REUSEPORT. Tested-by: Joe Stringer <joe@...ium.io> I also glanced through the commit, and the various protocols seem to be handled consistently at the very least, though I agree it'd be simpler for review and bisecting if broken down into more incremental changes.
Powered by blists - more mailing lists