lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAL+tcoDZyB8FX1=zMZq5vgB_DWc=qpmJ0xQP7N2T90a4wamFjQ@mail.gmail.com>
Date: Wed, 5 Feb 2025 13:13:06 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: Martin KaFai Lau <martin.lau@...ux.dev>, davem@...emloft.net, edumazet@...gle.com, 
	kuba@...nel.org, pabeni@...hat.com, dsahern@...nel.org, willemb@...gle.com, 
	ast@...nel.org, daniel@...earbox.net, andrii@...nel.org, eddyz87@...il.com, 
	song@...nel.org, yonghong.song@...ux.dev, john.fastabend@...il.com, 
	kpsingh@...nel.org, sdf@...ichev.me, haoluo@...gle.com, jolsa@...nel.org, 
	horms@...nel.org, bpf@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [PATCH bpf-next v7 11/13] net-timestamp: add a new callback in tcp_tx_timestamp()

On Wed, Feb 5, 2025 at 11:05 AM Jason Xing <kerneljasonxing@...il.com> wrote:
>
> On Wed, Feb 5, 2025 at 2:09 AM Jason Xing <kerneljasonxing@...il.com> wrote:
> >
> > On Wed, Feb 5, 2025 at 1:08 AM Willem de Bruijn
> > <willemdebruijn.kernel@...il.com> wrote:
> > >
> > > Jason Xing wrote:
> > > > On Tue, Feb 4, 2025 at 9:16 AM Martin KaFai Lau <martin.lau@...ux.dev> wrote:
> > > > >
> > > > > On 1/28/25 12:46 AM, Jason Xing wrote:
> > > > > > Introduce the callback to correlate tcp_sendmsg timestamp with other
> > > > > > points, like SND/SW/ACK. We can let bpf trace the beginning of
> > > > > > tcp_sendmsg_locked() and fetch the socket addr, so that in
> > > > >
> > > > > Instead of "fetch the socket addr...", should be "store the sendmsg timestamp at
> > > > > the bpf_sk_storage ...".
> > > >
> > > > I will revise it. Thanks.
> > > >
> > > > >
> > > > > > tcp_tx_timestamp() we can correlate the tskey with the socket addr.
> > > > >
> > > > >
> > > > > > It is accurate since they are under the protect of socket lock.
> > > > > > More details can be found in the selftest.
> > > > >
> > > > > The selftest uses the bpf_sk_storage to store the sendmsg timestamp at
> > > > > fentry/tcp_sendmsg_locked and retrieves it back at tcp_tx_timestamp (i.e.
> > > > > BPF_SOCK_OPS_TS_SND_CB added in this patch).
> > > > >
> > > > > >
> > > > > > Signed-off-by: Jason Xing <kerneljasonxing@...il.com>
> > > > > > ---
> > > > > >   include/uapi/linux/bpf.h       | 7 +++++++
> > > > > >   net/ipv4/tcp.c                 | 1 +
> > > > > >   tools/include/uapi/linux/bpf.h | 7 +++++++
> > > > > >   3 files changed, 15 insertions(+)
> > > > > >
> > > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > > > > index 800122a8abe5..accb3b314fff 100644
> > > > > > --- a/include/uapi/linux/bpf.h
> > > > > > +++ b/include/uapi/linux/bpf.h
> > > > > > @@ -7052,6 +7052,13 @@ enum {
> > > > > >                                        * when SK_BPF_CB_TX_TIMESTAMPING
> > > > > >                                        * feature is on.
> > > > > >                                        */
> > > > > > +     BPF_SOCK_OPS_TS_SND_CB,         /* Called when every sendmsg syscall
> > > > > > +                                      * is triggered. For TCP, it stays
> > > > > > +                                      * in the last send process to
> > > > > > +                                      * correlate with tcp_sendmsg timestamp
> > > > > > +                                      * with other timestamping callbacks,
> > > > > > +                                      * like SND/SW/ACK.
> > > > >
> > > > > Do you have a chance to look at how this will work at UDP?
> > > >
> > > > Sure, I feel like it could not be useful for UDP. Well, things get
> > > > strange because I did write a long paragraph about this thing which
> > > > apparently disappeared...
> > > >
> > > > I manage to find what I wrote:
> > > >     For UDP type, BPF_SOCK_OPS_TS_SND_CB may be not suitable because
> > > >     there are two sending process, 1) lockless path, 2) lock path, which
> > > >     should be handled carefully later. For the former, even though it's
> > > >     unlikely multiple threads access the socket to call sendmsg at the
> > > >     same time, I think we'd better not correlate it like what we do to the
> > > >     TCP case because of the lack of sock lock protection. Considering SND_CB is
> > > >     uapi flag, I think we don't need to forcely add the 'TCP_' prefix in
> > > >     case we need to use it someday.
> > > >
> > > >     And one more thing is I'd like to use the v5[1] method in the next round
> > > >     to introduce a new tskey_bpf which is good for UDP type. The new field
> > > >     will not conflict with the tskey in shared info which is generated
> > > >     by sk->sk_tskey in __ip_append_data(). It hardly works if both features
> > > >     (so_timestamping and its bpf extension) exists at the same time. Users
> > > >     could get confused because sometimes they fetch the tskey from skb,
> > > >     sometimes they don't, especially when we have cmsg feature to turn it on/
> > > >     off per sendmsg. A standalone tskey for bpf extension will be needed.
> > > >     With this tskey_bpf, we can easily correlate the timestamp in sendmsg
> > > >     syscall with other tx points(SND/SW/ACK...).
> > > >
> > > >     [1]: https://lore.kernel.org/all/20250112113748.73504-14-kerneljasonxing@gmail.com/
> > > >
> > > >     If possible, we can leave this question until the UDP support series
> > > >     shows up. I will figure out a better solution :)
> > > >
> > > > In conclusion, it probably won't be used by the UDP type. It's uAPI
> > > > flag so I consider the compatibility reason.
> > >
> > > I don't think this is acceptable. We should aim for an API that can
> > > easily be used across protocols, like SO_TIMESTAMPING.
> >
> > After I revisit the UDP SO_TIMESTAMPING again, my thoughts are
> > adjusted like below:
> >
> > It's hard to provide an absolutely uniform interface or usage to users
> > for TCP and UDP and even more protocols. Cases can be handled one by
> > one. The main obstacle is how we can correlate the timestamp in
> > sendmsg syscall with other sending timestamps. It's worth noticing
> > that for SO_TIMESTAMPING the sendmsg timestamp is collected in the
> > userspace. For instance, while skb enters the qdisc, we fail to know
> > which skb belongs to which sendmsg.
> >
> > An idea coming up is to introduce BPF_SOCK_OPS_TS_SND_CB to correlate
> > the sendmsg timestamp with tskey (in tcp_tx_timestamp()) under the
> > protection of socket lock + syscall as the current patch does. But for
> > UDP, it can be lockless. IIUC, there is a very special case where even
> > SO_TIMESTAMPING may get lost: if multiple threads accessing the same
> > socket send UDP packets in parallel, then users could be confused
> > which tskey matches which sendmsg. IIUC, I will not consider this
> > unlikely case, then the UDP case is quite similar to the TCP case.
> >
> > The scenario for the UDP case is:
> > 1) using fentry bpf to hook the udp_sendmsg() to get the timestamp
> > like TCP does in this series.
> > 2) insert BPF_SOCK_OPS_TS_SND_CB into __ip_append_data() near the
> > SO_TIMESTAMPING code snippets to let bpf program correlate the tskey
> > with timestamp.
> > Note: tskey in UDP will be handled carefully in a different way
> > because we should support both modes for socket timestamping at the
> > same time.
> > It's really similar to TCP regardless of handling tskey.
> >
>
> To be more precise in case you don't have much time to read the above
> long paragraph, BPF_SOCK_OPS_TS_SND_CB is mainly used to correlate
> sendmsg timestamp with corresponding tskey.
>
> 1. For TCP, we can correlate it in tcp_tx_timestamp() like this patch does.
>
> 2. For UDP, we can correlate in __ip_append_data() along with those
> tskey initialization, assuming there are no multiple threads calling
> locklessly ip_make_skb(). Locked path
> (udp_sendmsg()->ip_append_data()) works like TCP under the socket lock
> protection, so it can be easily handled. Lockless path
> (udp_sendmsg()->ip_make_skb()) can be visited by multiple threads at
> the same time, which should be handled properly. I prefer to implement
> the bpf extension for IPCORK_TS_OPT_ID, which should be another topic,
> I think. This might be the only one corner case, IIUC?
>
> Overall I think BPF_SOCK_OPS_TS_SND_CB can work across protocols to do
> the correlation job.
>
> To be on the safe side, I can change the name BPF_SOCK_OPS_TS_SND_CB
> to BPF_SOCK_OPS_TS_TCP_SND_CB just in case this approach is not the
> best one. What do you think about this?
>
> [1]
> commit 4aecca4c76808f3736056d18ff510df80424bc9f
> Author: Vadim Fedorenko <vadim.fedorenko@...ux.dev>
> Date:   Tue Oct 1 05:57:14 2024 -0700
>
>     net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
>
>     SOF_TIMESTAMPING_OPT_ID socket option flag gives a way to correlate TX
>     timestamps and packets sent via socket. Unfortunately, there is no way
>     to reliably predict socket timestamp ID value in case of error returned
>     by sendmsg. For UDP sockets it's impossible because of lockless
>     nature of UDP transmit, several threads may send packets in parallel. In
>     case of RAW sockets MSG_MORE option makes things complicated. More
>     details are in the conversation [1].
>     This patch adds new control message type to give user-space
>     software an opportunity to control the mapping between packets and
>     values by providing ID with each sendmsg for UDP sockets.
>     The documentation is also added in this patch.
>
>     [1] https://lore.kernel.org/netdev/CALCETrU0jB+kg0mhV6A8mrHfTE1D1pr1SD_B9Eaa9aDPfgHdtA@mail.gmail.com/
>

Oh, I came up with a feasible approach for UDP protocol:
1. introduce a field ts_opt_id_bpf which works like ts_opt_id to allow
the bpf program to fully take control of the management of tskey.
2. use fentry hook udp_sendmsg(), and introduce a callback function
like BPF_SOCK_OPS_TIMEOUT_INIT in kernel to initialize the
ts_opt_id_bpf with tskey that bpf prog generates. We can directly use
BPF_SOCK_OPS_TS_SND_CB.
3. modify the SCM_TS_OPT_ID logic to support bpf extension so that the
newly added field ts_opt_id_bpf can be passed to the
skb_shinfo(skb)->tskey in __ip_append_data().

In this way, this approach can also be extended for other protocols.

Thanks,
Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ