[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <67a3d732aeff0_170d392942b@willemb.c.googlers.com.notmuch>
Date: Wed, 05 Feb 2025 16:25:06 -0500
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Jason Xing <kerneljasonxing@...il.com>,
Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: davem@...emloft.net,
edumazet@...gle.com,
kuba@...nel.org,
pabeni@...hat.com,
dsahern@...nel.org,
willemb@...gle.com,
ast@...nel.org,
daniel@...earbox.net,
andrii@...nel.org,
martin.lau@...ux.dev,
eddyz87@...il.com,
song@...nel.org,
yonghong.song@...ux.dev,
john.fastabend@...il.com,
kpsingh@...nel.org,
sdf@...ichev.me,
haoluo@...gle.com,
jolsa@...nel.org,
horms@...nel.org,
bpf@...r.kernel.org,
netdev@...r.kernel.org
Subject: Re: [PATCH bpf-next v8 01/12] bpf: add support for bpf_setsockopt()
Jason Xing wrote:
> On Wed, Feb 5, 2025 at 11:22 PM Willem de Bruijn
> <willemdebruijn.kernel@...il.com> wrote:
> >
> > Jason Xing wrote:
> > > Users can write the following code to enable the bpf extension:
> > > int flags = SK_BPF_CB_TX_TIMESTAMPING;
> > > int opts = SK_BPF_CB_FLAGS;
> > > bpf_setsockopt(skops, SOL_SOCKET, opts, &flags, sizeof(flags));
> > >
> > > Signed-off-by: Jason Xing <kerneljasonxing@...il.com>
> > > ---
> > > include/net/sock.h | 3 +++
> > > include/uapi/linux/bpf.h | 8 ++++++++
> > > net/core/filter.c | 23 +++++++++++++++++++++++
> > > tools/include/uapi/linux/bpf.h | 1 +
> > > 4 files changed, 35 insertions(+)
> > >
> > > diff --git a/include/net/sock.h b/include/net/sock.h
> > > index 8036b3b79cd8..7916982343c6 100644
> > > --- a/include/net/sock.h
> > > +++ b/include/net/sock.h
> > > @@ -303,6 +303,7 @@ struct sk_filter;
> > > * @sk_stamp: time stamp of last packet received
> > > * @sk_stamp_seq: lock for accessing sk_stamp on 32 bit architectures only
> > > * @sk_tsflags: SO_TIMESTAMPING flags
> > > + * @sk_bpf_cb_flags: used in bpf_setsockopt()
> > > * @sk_use_task_frag: allow sk_page_frag() to use current->task_frag.
> > > * Sockets that can be used under memory reclaim should
> > > * set this to false.
> > > @@ -445,6 +446,8 @@ struct sock {
> > > u32 sk_reserved_mem;
> > > int sk_forward_alloc;
> > > u32 sk_tsflags;
> > > +#define SK_BPF_CB_FLAG_TEST(SK, FLAG) ((SK)->sk_bpf_cb_flags & (FLAG))
> > > + u32 sk_bpf_cb_flags;
> > > __cacheline_group_end(sock_write_rxtx);
> > >
> > > __cacheline_group_begin(sock_write_tx);
> > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > index 2acf9b336371..6116eb3d1515 100644
> > > --- a/include/uapi/linux/bpf.h
> > > +++ b/include/uapi/linux/bpf.h
> > > @@ -6913,6 +6913,13 @@ enum {
> > > BPF_SOCK_OPS_ALL_CB_FLAGS = 0x7F,
> > > };
> > >
> > > +/* Definitions for bpf_sk_cb_flags */
> > > +enum {
> > > + SK_BPF_CB_TX_TIMESTAMPING = 1<<0,
> > > + SK_BPF_CB_MASK = (SK_BPF_CB_TX_TIMESTAMPING - 1) |
> > > + SK_BPF_CB_TX_TIMESTAMPING
> > > +};
> > > +
> > > /* List of known BPF sock_ops operators.
> > > * New entries can only be added at the end
> > > */
> > > @@ -7091,6 +7098,7 @@ enum {
> > > TCP_BPF_SYN_IP = 1006, /* Copy the IP[46] and TCP header */
> > > TCP_BPF_SYN_MAC = 1007, /* Copy the MAC, IP[46], and TCP header */
> > > TCP_BPF_SOCK_OPS_CB_FLAGS = 1008, /* Get or Set TCP sock ops flags */
> > > + SK_BPF_CB_FLAGS = 1009, /* Used to set socket bpf flags */
> > > };
> > >
> > > enum {
> > > diff --git a/net/core/filter.c b/net/core/filter.c
> > > index 2ec162dd83c4..1c6c07507a78 100644
> > > --- a/net/core/filter.c
> > > +++ b/net/core/filter.c
> > > @@ -5222,6 +5222,25 @@ static const struct bpf_func_proto bpf_get_socket_uid_proto = {
> > > .arg1_type = ARG_PTR_TO_CTX,
> > > };
> > >
> > > +static int sk_bpf_set_get_cb_flags(struct sock *sk, char *optval, bool getopt)
> > > +{
> > > + u32 sk_bpf_cb_flags;
> > > +
> > > + if (getopt) {
> > > + *(u32 *)optval = sk->sk_bpf_cb_flags;
> > > + return 0;
> > > + }
> > > +
> > > + sk_bpf_cb_flags = *(u32 *)optval;
> > > +
> > > + if (sk_bpf_cb_flags & ~SK_BPF_CB_MASK)
> > > + return -EINVAL;
> > > +
> > > + sk->sk_bpf_cb_flags = sk_bpf_cb_flags;
> >
> > I don't know BPF internals that well:
> >
> > Is there mutual exclusion between these sol_socket_sockopt calls?
> > Or do these sk field accesses need WRITE_ONCE/READ_ONCE.
>
> According to the existing callbacks (like
> BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB which I used in the selftests) in
> include/uapi/linux/bpf.h, they are under the socket lock protection.
> And the correct use of this feature is to set during the 3-way
> handshake that also is protected by lock. But after you remind me of
> this potential data race issue, just in case bpf program doesn't use
> it as we expect, I think I will add the this annotation in v9.
Let's not add instrumentation defensively where not needed. Doing so
confuses future readers who will assume that it was needed and cannot
see why.
Either leave it for now. Or if it is needed for lockless UDP, add it
now, but then add an explicit comment that it is for that use case.
Powered by blists - more mailing lists