[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL+tcoCJNM3YyLQpFCCUtHPN7dU+o721yBYE71+hs9-1r937Xg@mail.gmail.com>
Date: Thu, 20 Feb 2025 08:04:11 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: Martin KaFai Lau <martin.lau@...ux.dev>, davem@...emloft.net, edumazet@...gle.com,
kuba@...nel.org, pabeni@...hat.com, dsahern@...nel.org, willemb@...gle.com,
ast@...nel.org, daniel@...earbox.net, andrii@...nel.org, eddyz87@...il.com,
song@...nel.org, yonghong.song@...ux.dev, john.fastabend@...il.com,
kpsingh@...nel.org, sdf@...ichev.me, haoluo@...gle.com, jolsa@...nel.org,
shuah@...nel.org, ykolal@...com, bpf@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [PATCH bpf-next v12 01/12] bpf: add networking timestamping
support to bpf_get/setsockopt()
On Wed, Feb 19, 2025 at 11:12 PM Willem de Bruijn
<willemdebruijn.kernel@...il.com> wrote:
>
> > > > Now I wonder if I should use the u8 sk_bpf_cb_flags in V13 or just
> > > > keep it as-is? Either way is fine with me :) bpf_sock_ops_cb_flags
> > > > uses u8 as an example, thus I think we prefer the former?
> > >
> > > If it fits in a u8 and that in practice also results in less memory
> > > and cache pressure (i.e., does not just add a 24b hole), then it is a
> > > net improvement.
> >
> > Probably I didn't state it clearly. I agree with you on saving memory:)
> >
> > In the previous response, I was trying to keep the sk_bpf_cb_flags
> > flag and use a u8 instead. I admit u32 is too large after you noticed
> > this.
> >
> > Would the following diff on top of this series be acceptable for you?
> > And would it be a proper place to put the u8 sk_bpf_cb_flags in struct
> > sock?
> > diff --git a/include/net/sock.h b/include/net/sock.h
> > index 6f4d54faba92..e85d6fb3a2ba 100644
> > --- a/include/net/sock.h
> > +++ b/include/net/sock.h
> > @@ -447,7 +447,7 @@ struct sock {
> > int sk_forward_alloc;
> > u32 sk_tsflags;
> > #define SK_BPF_CB_FLAG_TEST(SK, FLAG) ((SK)->sk_bpf_cb_flags & (FLAG))
> > - u32 sk_bpf_cb_flags;
> > + u8 sk_bpf_cb_flags;
> > __cacheline_group_end(sock_write_rxtx);
> >
> > __cacheline_group_begin(sock_write_tx);
> >
> > The following output is the result of running 'pahole --hex -C sock vmlinux'.
> > Before this series:
> > u32 sk_tsflags; /* 0x168 0x4 */
> > __u8
> > __cacheline_group_end__sock_write_rxtx[0]; /* 0x16c 0 */
> > __u8
> > __cacheline_group_begin__sock_write_tx[0]; /* 0x16c 0 */
> > int sk_write_pending; /* 0x16c 0x4 */
> > atomic_t sk_omem_alloc; /* 0x170 0x4 */
> > int sk_sndbuf; /* 0x174 0x4 */
> > int sk_wmem_queued; /* 0x178 0x4 */
> > refcount_t sk_wmem_alloc; /* 0x17c 0x4 */
> > /* --- cacheline 6 boundary (384 bytes) --- */
> > long unsigned int sk_tsq_flags; /* 0x180 0x8 */
> > ...
> > /* sum members: 773, holes: 1, sum holes: 1 */
> >
> > After this diff patch:
> > u32 sk_tsflags; /* 0x168 0x4 */
> > u8 sk_bpf_cb_flags; /* 0x16c 0x1 */
> > __u8
> > __cacheline_group_end__sock_write_rxtx[0]; /* 0x16d 0 */
> > __u8
> > __cacheline_group_begin__sock_write_tx[0]; /* 0x16d 0 */
> >
> > /* XXX 3 bytes hole, try to pack */
> >
> > int sk_write_pending; /* 0x170 0x4 */
> > atomic_t sk_omem_alloc; /* 0x174 0x4 */
> > int sk_sndbuf; /* 0x178 0x4 */
> > int sk_wmem_queued; /* 0x17c 0x4 */
> > /* --- cacheline 6 boundary (384 bytes) --- */
> > refcount_t sk_wmem_alloc; /* 0x180 0x4 */
> >
> > /* XXX 4 bytes hole, try to pack */
> >
> > long unsigned int sk_tsq_flags; /* 0x188 0x8 */
> > ...
> > /* sum members: 774, holes: 3, sum holes: 8 */
> >
> > It will introduce 7 extra sum holes if this series with this u8 change
> > gets applied. I think it's a proper position because this new
> > sk_bpf_cb_flags will be used in the tx and rx path just like
> > sk_tsflags, aligned with rules introduced by the commit[1].
>
> Reducing a u64 to u8 can leave 7b of holes, but that is not great,
> of course.
>
> Since this bitmap is only touched if a BPF program is loaded, arguably
> it need not be in the hot path cacheline groups.
Point taken.
>
> Can you find a hole further down to place this in, or at least a spot
> that does not result in 7b of wasted space (in the hotpath cacheline
> groups of all places).
There is one place where I can simply insert the flag.
The diff patch on top of this series is:
diff --git a/include/net/sock.h b/include/net/sock.h
index e85d6fb3a2ba..9fa27693fb02 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -446,8 +446,6 @@ struct sock {
u32 sk_reserved_mem;
int sk_forward_alloc;
u32 sk_tsflags;
-#define SK_BPF_CB_FLAG_TEST(SK, FLAG) ((SK)->sk_bpf_cb_flags & (FLAG))
- u8 sk_bpf_cb_flags;
__cacheline_group_end(sock_write_rxtx);
__cacheline_group_begin(sock_write_tx);
@@ -528,6 +526,8 @@ struct sock {
u8 sk_txtime_deadline_mode : 1,
sk_txtime_report_errors : 1,
sk_txtime_unused : 6;
+#define SK_BPF_CB_FLAG_TEST(SK, FLAG) ((SK)->sk_bpf_cb_flags & (FLAG))
+ u8 sk_bpf_cb_flags;
void *sk_user_data;
#ifdef CONFIG_SECURITY
1) before applying the whole series:
...
/* --- cacheline 10 boundary (640 bytes) --- */
ktime_t sk_stamp; /* 0x280 0x8 */
int sk_disconnects; /* 0x288 0x4 */
u8 sk_txrehash; /* 0x28c 0x1 */
u8 sk_clockid; /* 0x28d 0x1 */
u8 sk_txtime_deadline_mode:1; /* 0x28e: 0 0x1 */
u8 sk_txtime_report_errors:1; /*
0x28e:0x1 0x1 */
u8 sk_txtime_unused:6; /* 0x28e:0x2 0x1 */
/* XXX 1 byte hole, try to pack */
void * sk_user_data; /* 0x290 0x8 */
void * sk_security; /* 0x298 0x8 */
struct sock_cgroup_data sk_cgrp_data; /* 0x2a0 0x10 */
...
/* sum members: 773, holes: 1, sum holes: 1 */
2) after applying the series with the above diff patch:
...
/* --- cacheline 10 boundary (640 bytes) --- */
ktime_t sk_stamp; /* 0x280 0x8 */
int sk_disconnects; /* 0x288 0x4 */
u8 sk_txrehash; /* 0x28c 0x1 */
u8 sk_clockid; /* 0x28d 0x1 */
u8 sk_txtime_deadline_mode:1; /* 0x28e: 0 0x1 */
u8 sk_txtime_report_errors:1; /*
0x28e:0x1 0x1 */
u8 sk_txtime_unused:6; /* 0x28e:0x2 0x1 */
u8 sk_bpf_cb_flags; /* 0x28f 0x1 */
void * sk_user_data; /* 0x290
0x8 */
void * sk_security; /* 0x298 0x8 */
struct sock_cgroup_data sk_cgrp_data; /* 0x2a0 0x10 */
...
/* sum members: 774 */
It turns out that the new sk_bpf_cb_flags fills the hole exactly. The
new field and some of its nearby fields are quite similar because they
are only/nearly written during the creation or setsockopt phase.
I think now it's a good place to insert the new flag?
Thanks,
Jason
Powered by blists - more mailing lists