lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL+tcoCJNM3YyLQpFCCUtHPN7dU+o721yBYE71+hs9-1r937Xg@mail.gmail.com>
Date: Thu, 20 Feb 2025 08:04:11 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: Martin KaFai Lau <martin.lau@...ux.dev>, davem@...emloft.net, edumazet@...gle.com, 
	kuba@...nel.org, pabeni@...hat.com, dsahern@...nel.org, willemb@...gle.com, 
	ast@...nel.org, daniel@...earbox.net, andrii@...nel.org, eddyz87@...il.com, 
	song@...nel.org, yonghong.song@...ux.dev, john.fastabend@...il.com, 
	kpsingh@...nel.org, sdf@...ichev.me, haoluo@...gle.com, jolsa@...nel.org, 
	shuah@...nel.org, ykolal@...com, bpf@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [PATCH bpf-next v12 01/12] bpf: add networking timestamping
 support to bpf_get/setsockopt()

On Wed, Feb 19, 2025 at 11:12 PM Willem de Bruijn
<willemdebruijn.kernel@...il.com> wrote:
>
> > > > Now I wonder if I should use the u8 sk_bpf_cb_flags in V13 or just
> > > > keep it as-is? Either way is fine with me :) bpf_sock_ops_cb_flags
> > > > uses u8 as an example, thus I think we prefer the former?
> > >
> > > If it fits in a u8 and that in practice also results in less memory
> > > and cache pressure (i.e., does not just add a 24b hole), then it is a
> > > net improvement.
> >
> > Probably I didn't state it clearly. I agree with you on saving memory:)
> >
> > In the previous response, I was trying to keep the sk_bpf_cb_flags
> > flag and use a u8 instead. I admit u32 is too large after you noticed
> > this.
> >
> > Would the following diff on top of this series be acceptable for you?
> > And would it be a proper place to put the u8 sk_bpf_cb_flags in struct
> > sock?
> > diff --git a/include/net/sock.h b/include/net/sock.h
> > index 6f4d54faba92..e85d6fb3a2ba 100644
> > --- a/include/net/sock.h
> > +++ b/include/net/sock.h
> > @@ -447,7 +447,7 @@ struct sock {
> >         int                     sk_forward_alloc;
> >         u32                     sk_tsflags;
> >  #define SK_BPF_CB_FLAG_TEST(SK, FLAG) ((SK)->sk_bpf_cb_flags & (FLAG))
> > -       u32                     sk_bpf_cb_flags;
> > +       u8                      sk_bpf_cb_flags;
> >         __cacheline_group_end(sock_write_rxtx);
> >
> >         __cacheline_group_begin(sock_write_tx);
> >
> > The following output is the result of running 'pahole --hex -C sock vmlinux'.
> > Before this series:
> >         u32                        sk_tsflags;           /* 0x168   0x4 */
> >         __u8
> > __cacheline_group_end__sock_write_rxtx[0]; /* 0x16c     0 */
> >         __u8
> > __cacheline_group_begin__sock_write_tx[0]; /* 0x16c     0 */
> >         int                        sk_write_pending;     /* 0x16c   0x4 */
> >         atomic_t                   sk_omem_alloc;        /* 0x170   0x4 */
> >         int                        sk_sndbuf;            /* 0x174   0x4 */
> >         int                        sk_wmem_queued;       /* 0x178   0x4 */
> >         refcount_t                 sk_wmem_alloc;        /* 0x17c   0x4 */
> >         /* --- cacheline 6 boundary (384 bytes) --- */
> >         long unsigned int          sk_tsq_flags;         /* 0x180   0x8 */
> > ...
> > /* sum members: 773, holes: 1, sum holes: 1 */
> >
> > After this diff patch:
> >         u32                        sk_tsflags;           /* 0x168   0x4 */
> >         u8                         sk_bpf_cb_flags;      /* 0x16c   0x1 */
> >         __u8
> > __cacheline_group_end__sock_write_rxtx[0]; /* 0x16d     0 */
> >         __u8
> > __cacheline_group_begin__sock_write_tx[0]; /* 0x16d     0 */
> >
> >         /* XXX 3 bytes hole, try to pack */
> >
> >         int                        sk_write_pending;     /* 0x170   0x4 */
> >         atomic_t                   sk_omem_alloc;        /* 0x174   0x4 */
> >         int                        sk_sndbuf;            /* 0x178   0x4 */
> >         int                        sk_wmem_queued;       /* 0x17c   0x4 */
> >         /* --- cacheline 6 boundary (384 bytes) --- */
> >         refcount_t                 sk_wmem_alloc;        /* 0x180   0x4 */
> >
> >         /* XXX 4 bytes hole, try to pack */
> >
> >         long unsigned int          sk_tsq_flags;         /* 0x188   0x8 */
> > ...
> > /* sum members: 774, holes: 3, sum holes: 8 */
> >
> > It will introduce 7 extra sum holes if this series with this u8 change
> > gets applied. I think it's a proper position because this new
> > sk_bpf_cb_flags will be used in the tx and rx path just like
> > sk_tsflags, aligned with rules introduced by the commit[1].
>
> Reducing a u64 to u8 can leave 7b of holes, but that is not great,
> of course.
>
> Since this bitmap is only touched if a BPF program is loaded, arguably
> it need not be in the hot path cacheline groups.

Point taken.

>
> Can you find a hole further down to place this in, or at least a spot
> that does not result in 7b of wasted space (in the hotpath cacheline
> groups of all places).

There is one place where I can simply insert the flag.

The diff patch on top of this series is:
diff --git a/include/net/sock.h b/include/net/sock.h
index e85d6fb3a2ba..9fa27693fb02 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -446,8 +446,6 @@ struct sock {
        u32                     sk_reserved_mem;
        int                     sk_forward_alloc;
        u32                     sk_tsflags;
-#define SK_BPF_CB_FLAG_TEST(SK, FLAG) ((SK)->sk_bpf_cb_flags & (FLAG))
-       u8                      sk_bpf_cb_flags;
        __cacheline_group_end(sock_write_rxtx);

        __cacheline_group_begin(sock_write_tx);
@@ -528,6 +526,8 @@ struct sock {
        u8                      sk_txtime_deadline_mode : 1,
                                sk_txtime_report_errors : 1,
                                sk_txtime_unused : 6;
+#define SK_BPF_CB_FLAG_TEST(SK, FLAG) ((SK)->sk_bpf_cb_flags & (FLAG))
+       u8                      sk_bpf_cb_flags;

        void                    *sk_user_data;
 #ifdef CONFIG_SECURITY


1) before applying the whole series:
...
        /* --- cacheline 10 boundary (640 bytes) --- */
        ktime_t                    sk_stamp;             /* 0x280   0x8 */
        int                        sk_disconnects;       /* 0x288   0x4 */
        u8                         sk_txrehash;          /* 0x28c   0x1 */
        u8                         sk_clockid;           /* 0x28d   0x1 */
        u8                         sk_txtime_deadline_mode:1; /* 0x28e: 0 0x1 */
        u8                         sk_txtime_report_errors:1; /*
0x28e:0x1 0x1 */
        u8                         sk_txtime_unused:6;   /* 0x28e:0x2 0x1 */

        /* XXX 1 byte hole, try to pack */

        void *                     sk_user_data;         /* 0x290   0x8 */
        void *                     sk_security;          /* 0x298   0x8 */
        struct sock_cgroup_data    sk_cgrp_data;         /* 0x2a0  0x10 */
...
/* sum members: 773, holes: 1, sum holes: 1 */


2) after applying the series with the above diff patch:
...
        /* --- cacheline 10 boundary (640 bytes) --- */
        ktime_t                    sk_stamp;             /* 0x280   0x8 */
        int                        sk_disconnects;       /* 0x288   0x4 */
        u8                         sk_txrehash;          /* 0x28c   0x1 */
        u8                         sk_clockid;           /* 0x28d   0x1 */
        u8                         sk_txtime_deadline_mode:1; /* 0x28e: 0 0x1 */
        u8                         sk_txtime_report_errors:1; /*
0x28e:0x1 0x1 */
        u8                         sk_txtime_unused:6;   /* 0x28e:0x2 0x1 */
        u8                         sk_bpf_cb_flags;      /* 0x28f   0x1 */
        void *                     sk_user_data;         /* 0x290
0x8 */
        void *                     sk_security;          /* 0x298   0x8 */
        struct sock_cgroup_data    sk_cgrp_data;         /* 0x2a0  0x10 */
...
/* sum members: 774 */

It turns out that the new sk_bpf_cb_flags fills the hole exactly. The
new field and some of its nearby fields are quite similar because they
are only/nearly written during the creation or setsockopt phase.

I think now it's a good place to insert the new flag?

Thanks,
Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ