[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <51fe0012-e294-078a-4fc4-6151f8b55195@iogearbox.net>
Date: Mon, 4 Feb 2019 23:33:28 +0100
From: Daniel Borkmann <daniel@...earbox.net>
To: Martin KaFai Lau <kafai@...com>, netdev@...r.kernel.org
Cc: Alexei Starovoitov <ast@...com>, kernel-team@...com,
Lawrence Brakmo <brakmo@...com>
Subject: Re: [PATCH bpf-next 1/6] bpf: Add a bpf_sock pointer to __sk_buff and
a bpf_sk_fullsock helper
Hi Martin,
On 02/01/2019 08:03 AM, Martin KaFai Lau wrote:
> In kernel, it is common to check "!skb->sk && sk_fullsock(skb->sk)"
> before accessing the fields in sock. For example, in __netdev_pick_tx:
>
> static u16 __netdev_pick_tx(struct net_device *dev, struct sk_buff *skb,
> struct net_device *sb_dev)
> {
> /* ... */
>
> struct sock *sk = skb->sk;
>
> if (queue_index != new_index && sk &&
> sk_fullsock(sk) &&
> rcu_access_pointer(sk->sk_dst_cache))
> sk_tx_queue_set(sk, new_index);
>
> /* ... */
>
> return queue_index;
> }
>
> This patch adds a "struct bpf_sock *sk" pointer to the "struct __sk_buff"
> where a few of the convert_ctx_access() in filter.c has already been
> accessing the skb->sk sock_common's fields,
> e.g. sock_ops_convert_ctx_access().
>
> "__sk_buff->sk" is a PTR_TO_SOCK_COMMON_OR_NULL in the verifier.
> Some of the fileds in "bpf_sock" will not be directly
> accessible through the "__sk_buff->sk" pointer. It is limited
> by the new "bpf_sock_common_is_valid_access()".
> e.g. The existing "type", "protocol", "mark" and "priority" in bpf_sock
> are not allowed.
>
> The newly added "struct bpf_sock *bpf_sk_fullsock(struct bpf_sock *sk)"
> can be used to get a sk with all accessible fields in "bpf_sock".
> This helper is added to both cg_skb and sched_(cls|act).
>
> int cg_skb_foo(struct __sk_buff *skb) {
> struct bpf_sock *sk;
> __u32 family;
>
> sk = skb->sk;
> if (!sk)
> return 1;
>
> sk = bpf_sk_fullsock(sk);
> if (!sk)
> return 1;
>
> if (sk->family != AF_INET6 || sk->protocol != IPPROTO_TCP)
> return 1;
>
> /* some_traffic_shaping(); */
>
> return 1;
> }
>
> (1) The sk is read only
>
> (2) There is no new "struct bpf_sock_common" introduced.
>
> (3) Future kernel sock's members could be added to bpf_sock only
> instead of repeatedly adding at multiple places like currently
> in bpf_sock_ops_md, bpf_sock_addr_md, sk_reuseport_md...etc.
>
> (4) After "sk = skb->sk", the reg holding sk is in type
> PTR_TO_SOCK_COMMON_OR_NULL.
>
> (5) After bpf_sk_fullsock(), the return type will be in type
> PTR_TO_SOCKET_OR_NULL which is the same as the return type of
> bpf_sk_lookup_xxx().
>
> However, bpf_sk_fullsock() does not take refcnt. The
> acquire_reference_state() is only depending on the return type now.
> To avoid it, a new is_acquire_function() is checked before calling
> acquire_reference_state().
Bit unfortunate that a helper like bpf_sk_fullsock() would be needed, after
all this is more of an implementation detail which we would expose here to
the developer.
Is there a specific reason why fetching skb->sk couldn't already be of the
type PTR_TO_SOCKET_OR_NULL such that the bpf_sk_fullsock() step wouldn't be
needed and most logic we have today could already be reused (modulo refcnt
avoidance)?
In particular, do you need the skb->sk without the full-sk part somewhere
(e.g. in tw socks)? Why not doing something like sk_to_full_sk() inside the
helper or even better as BPF ctx rewrite upon skb->sk to fetch the full sk
parent where you could also access remaining bpf_sock fields?
This could then also be plugged into bpf_tcp_sock() given this needs to be
full sk anyway.
Thanks,
Daniel
Powered by blists - more mailing lists