[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20220826192456.4z6khqm6g5sdg7vl@kafai-mbp.dhcp.thefacebook.com>
Date: Fri, 26 Aug 2022 12:24:56 -0700
From: Martin KaFai Lau <kafai@...com>
To: sdf@...gle.com
Cc: bpf@...r.kernel.org, netdev@...r.kernel.org,
Alexei Starovoitov <ast@...nel.org>,
Andrii Nakryiko <andrii@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
David Miller <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, kernel-team@...com,
Paolo Abeni <pabeni@...hat.com>
Subject: Re: [PATCH bpf-next 14/17] bpf: Change bpf_getsockopt(SOL_TCP) to
reuse do_tcp_getsockopt()
On Thu, Aug 25, 2022 at 11:36:36AM -0700, sdf@...gle.com wrote:
> On 08/24, Martin KaFai Lau wrote:
> > This patch changes bpf_getsockopt(SOL_TCP) to reuse
> > do_tcp_getsockopt(). It removes the duplicated code from
> > bpf_getsockopt(SOL_TCP).
>
> > Before this patch, there were some optnames available to
> > bpf_setsockopt(SOL_TCP) but missing in bpf_getsockopt(SOL_TCP).
> > For example, TCP_NODELAY, TCP_MAXSEG, TCP_KEEPIDLE, TCP_KEEPINTVL,
> > and a few more. It surprises users from time to time. This patch
> > automatically closes this gap without duplicating more code.
>
> > bpf_getsockopt(TCP_SAVED_SYN) does not free the saved_syn,
> > so it stays in sol_tcp_sockopt().
>
> > For string name value like TCP_CONGESTION, bpf expects it
> > is always null terminated, so sol_tcp_sockopt() decrements
> > optlen by one before calling do_tcp_getsockopt() and
> > the 'if (optlen < saved_optlen) memset(..,0,..);'
> > in __bpf_getsockopt() will always do a null termination.
>
> > Signed-off-by: Martin KaFai Lau <kafai@...com>
> > ---
> > include/net/tcp.h | 2 ++
> > net/core/filter.c | 70 ++++++++++++++++++++++++++---------------------
> > net/ipv4/tcp.c | 4 +--
> > 3 files changed, 43 insertions(+), 33 deletions(-)
>
> > diff --git a/include/net/tcp.h b/include/net/tcp.h
> > index c03a50c72f40..735e957f7f4b 100644
> > --- a/include/net/tcp.h
> > +++ b/include/net/tcp.h
> > @@ -402,6 +402,8 @@ void tcp_init_sock(struct sock *sk);
> > void tcp_init_transfer(struct sock *sk, int bpf_op, struct sk_buff *skb);
> > __poll_t tcp_poll(struct file *file, struct socket *sock,
> > struct poll_table_struct *wait);
> > +int do_tcp_getsockopt(struct sock *sk, int level,
> > + int optname, sockptr_t optval, sockptr_t optlen);
> > int tcp_getsockopt(struct sock *sk, int level, int optname,
> > char __user *optval, int __user *optlen);
> > bool tcp_bpf_bypass_getsockopt(int level, int optname);
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 68b52243b306..cdbbcec46e8b 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -5096,8 +5096,9 @@ static int bpf_sol_tcp_setsockopt(struct sock *sk,
> > int optname,
> > return 0;
> > }
>
> > -static int sol_tcp_setsockopt(struct sock *sk, int optname,
> > - char *optval, int optlen)
> > +static int sol_tcp_sockopt(struct sock *sk, int optname,
> > + char *optval, int *optlen,
> > + bool getopt)
> > {
> > if (sk->sk_prot->setsockopt != tcp_setsockopt)
> > return -EINVAL;
> > @@ -5114,17 +5115,47 @@ static int sol_tcp_setsockopt(struct sock *sk,
> > int optname,
> > case TCP_USER_TIMEOUT:
> > case TCP_NOTSENT_LOWAT:
> > case TCP_SAVE_SYN:
> > - if (optlen != sizeof(int))
> > + if (*optlen != sizeof(int))
> > return -EINVAL;
> > break;
>
> [..]
>
> > case TCP_CONGESTION:
> > + if (*optlen < 2)
> > + return -EINVAL;
> > + break;
> > + case TCP_SAVED_SYN:
> > + if (*optlen < 1)
> > + return -EINVAL;
> > break;
>
> This looks a bit inconsistent vs '*optlen != sizeof(int)' above. Maybe
>
> if (*optlen < sizeof(u16))
> if (*optlen < sizeof(u8))
TCP_CONGESTION (name string) and TCP_SAVED_SYN (raw binary)
are not expecting integer optval, so I think it is
better to stay away from using integer u16 or u8.
>
> ?
>
> > default:
> > - return bpf_sol_tcp_setsockopt(sk, optname, optval, optlen);
> > + if (getopt)
> > + return -EINVAL;
> > + return bpf_sol_tcp_setsockopt(sk, optname, optval, *optlen);
> > + }
> > +
> > + if (getopt) {
> > + if (optname == TCP_SAVED_SYN) {
> > + struct tcp_sock *tp = tcp_sk(sk);
> > +
> > + if (!tp->saved_syn ||
> > + *optlen > tcp_saved_syn_len(tp->saved_syn))
> > + return -EINVAL;
>
> You mention in the description that bpf doesn't doesn't free saved_syn,
> maybe worth putting a comment with the rationale here as well?
> I'm assuming we don't free from bpf because we want userspace to
> have an opportunity to read it as well?
Yes, it is the reason. I will add a comment.
>
> > + memcpy(optval, tp->saved_syn->data, *optlen);
> > + return 0;
> > + }
> > +
> > + if (optname == TCP_CONGESTION) {
> > + if (!inet_csk(sk)->icsk_ca_ops)
> > + return -EINVAL;
>
> Is it worth it doing null termination more explicitly here?
> For readability sake:
> /* BPF always expects NULL-terminated strings. */
> optval[*optlen-1] = '\0';
Yep. will do in v2.
> > + (*optlen)--;
> > + }
> > +
> > + return do_tcp_getsockopt(sk, SOL_TCP, optname,
> > + KERNEL_SOCKPTR(optval),
> > + KERNEL_SOCKPTR(optlen));
> > }
>
> > return do_tcp_setsockopt(sk, SOL_TCP, optname,
> > - KERNEL_SOCKPTR(optval), optlen);
> > + KERNEL_SOCKPTR(optval), *optlen);
> > }
Powered by blists - more mailing lists