[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAL+tcoAeBJ=F8cZ9qYwGF6jmc+DwA2byrrzAZjcpNYzrjT541g@mail.gmail.com>
Date: Wed, 5 Feb 2025 11:05:40 +0800
From: Jason Xing <kerneljasonxing@...il.com>
To: Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: Martin KaFai Lau <martin.lau@...ux.dev>, davem@...emloft.net, edumazet@...gle.com,
kuba@...nel.org, pabeni@...hat.com, dsahern@...nel.org, willemb@...gle.com,
ast@...nel.org, daniel@...earbox.net, andrii@...nel.org, eddyz87@...il.com,
song@...nel.org, yonghong.song@...ux.dev, john.fastabend@...il.com,
kpsingh@...nel.org, sdf@...ichev.me, haoluo@...gle.com, jolsa@...nel.org,
horms@...nel.org, bpf@...r.kernel.org, netdev@...r.kernel.org
Subject: Re: [PATCH bpf-next v7 11/13] net-timestamp: add a new callback in tcp_tx_timestamp()
On Wed, Feb 5, 2025 at 2:09 AM Jason Xing <kerneljasonxing@...il.com> wrote:
>
> On Wed, Feb 5, 2025 at 1:08 AM Willem de Bruijn
> <willemdebruijn.kernel@...il.com> wrote:
> >
> > Jason Xing wrote:
> > > On Tue, Feb 4, 2025 at 9:16 AM Martin KaFai Lau <martin.lau@...ux.dev> wrote:
> > > >
> > > > On 1/28/25 12:46 AM, Jason Xing wrote:
> > > > > Introduce the callback to correlate tcp_sendmsg timestamp with other
> > > > > points, like SND/SW/ACK. We can let bpf trace the beginning of
> > > > > tcp_sendmsg_locked() and fetch the socket addr, so that in
> > > >
> > > > Instead of "fetch the socket addr...", should be "store the sendmsg timestamp at
> > > > the bpf_sk_storage ...".
> > >
> > > I will revise it. Thanks.
> > >
> > > >
> > > > > tcp_tx_timestamp() we can correlate the tskey with the socket addr.
> > > >
> > > >
> > > > > It is accurate since they are under the protect of socket lock.
> > > > > More details can be found in the selftest.
> > > >
> > > > The selftest uses the bpf_sk_storage to store the sendmsg timestamp at
> > > > fentry/tcp_sendmsg_locked and retrieves it back at tcp_tx_timestamp (i.e.
> > > > BPF_SOCK_OPS_TS_SND_CB added in this patch).
> > > >
> > > > >
> > > > > Signed-off-by: Jason Xing <kerneljasonxing@...il.com>
> > > > > ---
> > > > > include/uapi/linux/bpf.h | 7 +++++++
> > > > > net/ipv4/tcp.c | 1 +
> > > > > tools/include/uapi/linux/bpf.h | 7 +++++++
> > > > > 3 files changed, 15 insertions(+)
> > > > >
> > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > > > index 800122a8abe5..accb3b314fff 100644
> > > > > --- a/include/uapi/linux/bpf.h
> > > > > +++ b/include/uapi/linux/bpf.h
> > > > > @@ -7052,6 +7052,13 @@ enum {
> > > > > * when SK_BPF_CB_TX_TIMESTAMPING
> > > > > * feature is on.
> > > > > */
> > > > > + BPF_SOCK_OPS_TS_SND_CB, /* Called when every sendmsg syscall
> > > > > + * is triggered. For TCP, it stays
> > > > > + * in the last send process to
> > > > > + * correlate with tcp_sendmsg timestamp
> > > > > + * with other timestamping callbacks,
> > > > > + * like SND/SW/ACK.
> > > >
> > > > Do you have a chance to look at how this will work at UDP?
> > >
> > > Sure, I feel like it could not be useful for UDP. Well, things get
> > > strange because I did write a long paragraph about this thing which
> > > apparently disappeared...
> > >
> > > I manage to find what I wrote:
> > > For UDP type, BPF_SOCK_OPS_TS_SND_CB may be not suitable because
> > > there are two sending process, 1) lockless path, 2) lock path, which
> > > should be handled carefully later. For the former, even though it's
> > > unlikely multiple threads access the socket to call sendmsg at the
> > > same time, I think we'd better not correlate it like what we do to the
> > > TCP case because of the lack of sock lock protection. Considering SND_CB is
> > > uapi flag, I think we don't need to forcely add the 'TCP_' prefix in
> > > case we need to use it someday.
> > >
> > > And one more thing is I'd like to use the v5[1] method in the next round
> > > to introduce a new tskey_bpf which is good for UDP type. The new field
> > > will not conflict with the tskey in shared info which is generated
> > > by sk->sk_tskey in __ip_append_data(). It hardly works if both features
> > > (so_timestamping and its bpf extension) exists at the same time. Users
> > > could get confused because sometimes they fetch the tskey from skb,
> > > sometimes they don't, especially when we have cmsg feature to turn it on/
> > > off per sendmsg. A standalone tskey for bpf extension will be needed.
> > > With this tskey_bpf, we can easily correlate the timestamp in sendmsg
> > > syscall with other tx points(SND/SW/ACK...).
> > >
> > > [1]: https://lore.kernel.org/all/20250112113748.73504-14-kerneljasonxing@gmail.com/
> > >
> > > If possible, we can leave this question until the UDP support series
> > > shows up. I will figure out a better solution :)
> > >
> > > In conclusion, it probably won't be used by the UDP type. It's uAPI
> > > flag so I consider the compatibility reason.
> >
> > I don't think this is acceptable. We should aim for an API that can
> > easily be used across protocols, like SO_TIMESTAMPING.
>
> After I revisit the UDP SO_TIMESTAMPING again, my thoughts are
> adjusted like below:
>
> It's hard to provide an absolutely uniform interface or usage to users
> for TCP and UDP and even more protocols. Cases can be handled one by
> one. The main obstacle is how we can correlate the timestamp in
> sendmsg syscall with other sending timestamps. It's worth noticing
> that for SO_TIMESTAMPING the sendmsg timestamp is collected in the
> userspace. For instance, while skb enters the qdisc, we fail to know
> which skb belongs to which sendmsg.
>
> An idea coming up is to introduce BPF_SOCK_OPS_TS_SND_CB to correlate
> the sendmsg timestamp with tskey (in tcp_tx_timestamp()) under the
> protection of socket lock + syscall as the current patch does. But for
> UDP, it can be lockless. IIUC, there is a very special case where even
> SO_TIMESTAMPING may get lost: if multiple threads accessing the same
> socket send UDP packets in parallel, then users could be confused
> which tskey matches which sendmsg. IIUC, I will not consider this
> unlikely case, then the UDP case is quite similar to the TCP case.
>
> The scenario for the UDP case is:
> 1) using fentry bpf to hook the udp_sendmsg() to get the timestamp
> like TCP does in this series.
> 2) insert BPF_SOCK_OPS_TS_SND_CB into __ip_append_data() near the
> SO_TIMESTAMPING code snippets to let bpf program correlate the tskey
> with timestamp.
> Note: tskey in UDP will be handled carefully in a different way
> because we should support both modes for socket timestamping at the
> same time.
> It's really similar to TCP regardless of handling tskey.
>
To be more precise in case you don't have much time to read the above
long paragraph, BPF_SOCK_OPS_TS_SND_CB is mainly used to correlate
sendmsg timestamp with corresponding tskey.
1. For TCP, we can correlate it in tcp_tx_timestamp() like this patch does.
2. For UDP, we can correlate in __ip_append_data() along with those
tskey initialization, assuming there are no multiple threads calling
locklessly ip_make_skb(). Locked path
(udp_sendmsg()->ip_append_data()) works like TCP under the socket lock
protection, so it can be easily handled. Lockless path
(udp_sendmsg()->ip_make_skb()) can be visited by multiple threads at
the same time, which should be handled properly. I prefer to implement
the bpf extension for IPCORK_TS_OPT_ID, which should be another topic,
I think. This might be the only one corner case, IIUC?
Overall I think BPF_SOCK_OPS_TS_SND_CB can work across protocols to do
the correlation job.
To be on the safe side, I can change the name BPF_SOCK_OPS_TS_SND_CB
to BPF_SOCK_OPS_TS_TCP_SND_CB just in case this approach is not the
best one. What do you think about this?
[1]
commit 4aecca4c76808f3736056d18ff510df80424bc9f
Author: Vadim Fedorenko <vadim.fedorenko@...ux.dev>
Date: Tue Oct 1 05:57:14 2024 -0700
net_tstamp: add SCM_TS_OPT_ID to provide OPT_ID in control message
SOF_TIMESTAMPING_OPT_ID socket option flag gives a way to correlate TX
timestamps and packets sent via socket. Unfortunately, there is no way
to reliably predict socket timestamp ID value in case of error returned
by sendmsg. For UDP sockets it's impossible because of lockless
nature of UDP transmit, several threads may send packets in parallel. In
case of RAW sockets MSG_MORE option makes things complicated. More
details are in the conversation [1].
This patch adds new control message type to give user-space
software an opportunity to control the mapping between packets and
values by providing ID with each sendmsg for UDP sockets.
The documentation is also added in this patch.
[1] https://lore.kernel.org/netdev/CALCETrU0jB+kg0mhV6A8mrHfTE1D1pr1SD_B9Eaa9aDPfgHdtA@mail.gmail.com/
Thanks,
Jason
Powered by blists - more mailing lists