lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <99681548-fa79-0607-d574-db61818cab78@iogearbox.net>
Date:   Thu, 25 May 2023 22:01:08 +0200
From:   Daniel Borkmann <daniel@...earbox.net>
To:     Kuniyuki Iwashima <kuniyu@...zon.com>, lmb@...valent.com
Cc:     andrii@...nel.org, ast@...nel.org, bpf@...r.kernel.org,
        davem@...emloft.net, dsahern@...nel.org, edumazet@...gle.com,
        haoluo@...gle.com, joe@...ium.io, joe@...d.net.nz,
        john.fastabend@...il.com, jolsa@...nel.org, kafai@...com,
        kpsingh@...nel.org, kuba@...nel.org, linux-kernel@...r.kernel.org,
        martin.lau@...ux.dev, netdev@...r.kernel.org, pabeni@...hat.com,
        sdf@...gle.com, song@...nel.org, willemdebruijn.kernel@...il.com,
        yhs@...com
Subject: Re: [PATCH bpf-next 1/2] bpf, net: Support SO_REUSEPORT sockets with
 bpf_sk_assign

On 5/25/23 7:41 PM, Kuniyuki Iwashima wrote:
> From: Lorenz Bauer <lmb@...valent.com>
> Date: Thu, 25 May 2023 09:19:22 +0100
>> Currently the bpf_sk_assign helper in tc BPF context refuses SO_REUSEPORT
>> sockets. This means we can't use the helper to steer traffic to Envoy, which
>> configures SO_REUSEPORT on its sockets. In turn, we're blocked from removing
>> TPROXY from our setup.
>>
>> The reason that bpf_sk_assign refuses such sockets is that the bpf_sk_lookup
>> helpers don't execute SK_REUSEPORT programs. Instead, one of the
>> reuseport sockets is selected by hash. This could cause dispatch to the
>> "wrong" socket:
>>
>>      sk = bpf_sk_lookup_tcp(...) // select SO_REUSEPORT by hash
>>      bpf_sk_assign(skb, sk) // SK_REUSEPORT wasn't executed
>>
>> Fixing this isn't as simple as invoking SK_REUSEPORT from the lookup
>> helpers unfortunately. In the tc context, L2 headers are at the start
>> of the skb, while SK_REUSEPORT expects L3 headers instead.
>>
>> Instead, we execute the SK_REUSEPORT program when the assigned socket
>> is pulled out of the skb, further up the stack. This creates some
>> trickiness with regards to refcounting as bpf_sk_assign will put both
>> refcounted and RCU freed sockets in skb->sk. reuseport sockets are RCU
>> freed. We can infer that the sk_assigned socket is RCU freed if the
>> reuseport lookup succeeds, but convincing yourself of this fact isn't
>> straight forward. Therefore we defensively check refcounting on the
>> sk_assign sock even though it's probably not required in practice.
>>
>> Fixes: 8e368dc ("bpf: Fix use of sk->sk_reuseport from sk_assign")
>> Fixes: cf7fbe6 ("bpf: Add socket assign support")
> 
> Please use 12 chars of hash.
> 
> $ cat ~/.gitconfig
> [core]
> 	abbrev = 12
> [pretty]
> 	fixes = Fixes: %h (\"%s\")
> 
> $ git show 8e368dc --pretty=fixes | head -n 1
> Fixes: 8e368dc72e86 ("bpf: Fix use of sk->sk_reuseport from sk_assign")

Yeap, not quite sure what happened here but the 12 chars is clear. Will
be fixed up in v2, too, ofc.

>> Co-developed-by: Daniel Borkmann <daniel@...earbox.net>
>> Signed-off-by: Daniel Borkmann <daniel@...earbox.net>
>> Signed-off-by: Lorenz Bauer <lmb@...valent.com>
>> Cc: Joe Stringer <joe@...ium.io>
>> Link: https://lore.kernel.org/bpf/CACAyw98+qycmpQzKupquhkxbvWK4OFyDuuLMBNROnfWMZxUWeA@mail.gmail.com/
>> ---
>>   include/net/inet6_hashtables.h | 36 +++++++++++++++++++++++++++++-----
>>   include/net/inet_hashtables.h  | 27 +++++++++++++++++++++++--
>>   include/net/sock.h             |  7 +++++--
>>   include/uapi/linux/bpf.h       |  3 ---
>>   net/core/filter.c              |  2 --
>>   net/ipv4/inet_hashtables.c     | 15 +++++++-------
>>   net/ipv4/udp.c                 | 23 +++++++++++++++++++---
>>   net/ipv6/inet6_hashtables.c    | 19 +++++++++---------
>>   net/ipv6/udp.c                 | 23 +++++++++++++++++++---
>>   tools/include/uapi/linux/bpf.h |  3 ---
>>   10 files changed, 119 insertions(+), 39 deletions(-)
>>
[...]
>> @@ -85,14 +92,33 @@ static inline struct sock *__inet6_lookup_skb(struct inet_hashinfo *hashinfo,
>>   					      int iif, int sdif,
>>   					      bool *refcounted)
>>   {
>> -	struct sock *sk = skb_steal_sock(skb, refcounted);
>> -
>> +	bool prefetched;
>> +	struct sock *sk = skb_steal_sock(skb, refcounted, &prefetched);
>> +	struct net *net = dev_net(skb_dst(skb)->dev);
>> +	const struct ipv6hdr *ip6h = ipv6_hdr(skb);
> 
> nit: Reverse Xmas Tree order.  Same for other chunks.

It is, the prefetched bool is simply used one line below. I don't think
this is much different than most other code from style pov..

>> +
>> +	if (prefetched) {
>> +		struct sock *reuse_sk = inet6_lookup_reuseport(net, sk, skb, doff,
>> +							       &ip6h->saddr, sport,
>> +							       &ip6h->daddr, ntohs(dport));
>> +		if (reuse_sk) {
>> +			if (reuse_sk != sk) {
>> +				if (*refcounted) {
>> +					sock_put(sk);
>> +					*refcounted = false;
>> +				}
>> +				if (IS_ERR(reuse_sk))
>> +					return NULL;
>> +			}
>> +			return reuse_sk;
>> +		}
> 
> Maybe we can add a hepler to avoid this duplication ?

We'll check if it can be made a bit nicer and integrate this into the v2.

Thanks,
Daniel

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ