[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <67a3d1dba6803_170d392948c@willemb.c.googlers.com.notmuch>
Date: Wed, 05 Feb 2025 16:02:19 -0500
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Jason Xing <kerneljasonxing@...il.com>,
Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: Martin KaFai Lau <martin.lau@...ux.dev>,
davem@...emloft.net,
edumazet@...gle.com,
kuba@...nel.org,
pabeni@...hat.com,
dsahern@...nel.org,
willemb@...gle.com,
ast@...nel.org,
daniel@...earbox.net,
andrii@...nel.org,
eddyz87@...il.com,
song@...nel.org,
yonghong.song@...ux.dev,
john.fastabend@...il.com,
kpsingh@...nel.org,
sdf@...ichev.me,
haoluo@...gle.com,
jolsa@...nel.org,
horms@...nel.org,
bpf@...r.kernel.org,
netdev@...r.kernel.org
Subject: Re: [PATCH bpf-next v7 11/13] net-timestamp: add a new callback in
tcp_tx_timestamp()
Jason Xing wrote:
> On Wed, Feb 5, 2025 at 11:20 PM Willem de Bruijn
> <willemdebruijn.kernel@...il.com> wrote:
> >
> > Jason Xing wrote:
> > > On Wed, Feb 5, 2025 at 2:09 AM Jason Xing <kerneljasonxing@...il.com> wrote:
> > > >
> > > > On Wed, Feb 5, 2025 at 1:08 AM Willem de Bruijn
> > > > <willemdebruijn.kernel@...il.com> wrote:
> > > > >
> > > > > Jason Xing wrote:
> > > > > > On Tue, Feb 4, 2025 at 9:16 AM Martin KaFai Lau <martin.lau@...ux.dev> wrote:
> > > > > > >
> > > > > > > On 1/28/25 12:46 AM, Jason Xing wrote:
> > > > > > > > Introduce the callback to correlate tcp_sendmsg timestamp with other
> > > > > > > > points, like SND/SW/ACK. We can let bpf trace the beginning of
> > > > > > > > tcp_sendmsg_locked() and fetch the socket addr, so that in
> > > > > > >
> > > > > > > Instead of "fetch the socket addr...", should be "store the sendmsg timestamp at
> > > > > > > the bpf_sk_storage ...".
> > > > > >
> > > > > > I will revise it. Thanks.
> > > > > >
> > > > > > >
> > > > > > > > tcp_tx_timestamp() we can correlate the tskey with the socket addr.
> > > > > > >
> > > > > > >
> > > > > > > > It is accurate since they are under the protect of socket lock.
> > > > > > > > More details can be found in the selftest.
> > > > > > >
> > > > > > > The selftest uses the bpf_sk_storage to store the sendmsg timestamp at
> > > > > > > fentry/tcp_sendmsg_locked and retrieves it back at tcp_tx_timestamp (i.e.
> > > > > > > BPF_SOCK_OPS_TS_SND_CB added in this patch).
> > > > > > >
> > > > > > > >
> > > > > > > > Signed-off-by: Jason Xing <kerneljasonxing@...il.com>
> > > > > > > > ---
> > > > > > > > include/uapi/linux/bpf.h | 7 +++++++
> > > > > > > > net/ipv4/tcp.c | 1 +
> > > > > > > > tools/include/uapi/linux/bpf.h | 7 +++++++
> > > > > > > > 3 files changed, 15 insertions(+)
> > > > > > > >
> > > > > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > > > > > > index 800122a8abe5..accb3b314fff 100644
> > > > > > > > --- a/include/uapi/linux/bpf.h
> > > > > > > > +++ b/include/uapi/linux/bpf.h
> > > > > > > > @@ -7052,6 +7052,13 @@ enum {
> > > > > > > > * when SK_BPF_CB_TX_TIMESTAMPING
> > > > > > > > * feature is on.
> > > > > > > > */
> > > > > > > > + BPF_SOCK_OPS_TS_SND_CB, /* Called when every sendmsg syscall
> > > > > > > > + * is triggered. For TCP, it stays
> > > > > > > > + * in the last send process to
> > > > > > > > + * correlate with tcp_sendmsg timestamp
> > > > > > > > + * with other timestamping callbacks,
> > > > > > > > + * like SND/SW/ACK.
> > > > > > >
> > > > > > > Do you have a chance to look at how this will work at UDP?
> > > > > >
> > > > > > Sure, I feel like it could not be useful for UDP. Well, things get
> > > > > > strange because I did write a long paragraph about this thing which
> > > > > > apparently disappeared...
> > > > > >
> > > > > > I manage to find what I wrote:
> > > > > > For UDP type, BPF_SOCK_OPS_TS_SND_CB may be not suitable because
> > > > > > there are two sending process, 1) lockless path, 2) lock path, which
> > > > > > should be handled carefully later. For the former, even though it's
> > > > > > unlikely multiple threads access the socket to call sendmsg at the
> > > > > > same time, I think we'd better not correlate it like what we do to the
> > > > > > TCP case because of the lack of sock lock protection. Considering SND_CB is
> > > > > > uapi flag, I think we don't need to forcely add the 'TCP_' prefix in
> > > > > > case we need to use it someday.
> > > > > >
> > > > > > And one more thing is I'd like to use the v5[1] method in the next round
> > > > > > to introduce a new tskey_bpf which is good for UDP type. The new field
> > > > > > will not conflict with the tskey in shared info which is generated
> > > > > > by sk->sk_tskey in __ip_append_data(). It hardly works if both features
> > > > > > (so_timestamping and its bpf extension) exists at the same time. Users
> > > > > > could get confused because sometimes they fetch the tskey from skb,
> > > > > > sometimes they don't, especially when we have cmsg feature to turn it on/
> > > > > > off per sendmsg. A standalone tskey for bpf extension will be needed.
> > > > > > With this tskey_bpf, we can easily correlate the timestamp in sendmsg
> > > > > > syscall with other tx points(SND/SW/ACK...).
> > > > > >
> > > > > > [1]: https://lore.kernel.org/all/20250112113748.73504-14-kerneljasonxing@gmail.com/
> > > > > >
> > > > > > If possible, we can leave this question until the UDP support series
> > > > > > shows up. I will figure out a better solution :)
> > > > > >
> > > > > > In conclusion, it probably won't be used by the UDP type. It's uAPI
> > > > > > flag so I consider the compatibility reason.
> > > > >
> > > > > I don't think this is acceptable. We should aim for an API that can
> > > > > easily be used across protocols, like SO_TIMESTAMPING.
> > > >
> > > > After I revisit the UDP SO_TIMESTAMPING again, my thoughts are
> > > > adjusted like below:
> > > >
> > > > It's hard to provide an absolutely uniform interface or usage to users
> > > > for TCP and UDP and even more protocols. Cases can be handled one by
> > > > one.
> >
> > We should try hard. SO_TIMESTAMPING is uniform across protocols.
> > An interface that is not is just hard to use.
> >
> > > > The main obstacle is how we can correlate the timestamp in
> > > > sendmsg syscall with other sending timestamps. It's worth noticing
> > > > that for SO_TIMESTAMPING the sendmsg timestamp is collected in the
> > > > userspace. For instance, while skb enters the qdisc, we fail to know
> > > > which skb belongs to which sendmsg.
> > > >
> > > > An idea coming up is to introduce BPF_SOCK_OPS_TS_SND_CB to correlate
> > > > the sendmsg timestamp with tskey (in tcp_tx_timestamp()) under the
> > > > protection of socket lock + syscall as the current patch does. But for
> > > > UDP, it can be lockless. IIUC, there is a very special case where even
> > > > SO_TIMESTAMPING may get lost: if multiple threads accessing the same
> > > > socket send UDP packets in parallel, then users could be confused
> > > > which tskey matches which sendmsg.
> >
> > This is a known issue for lockless datagram sockets.
> >
> > With SO_TIMESTAMPING, but the use of timestamping and of concurrent
> > sendmsg calls is under control of the process, so it only shoots
> > itself in the foot.
> >
> > With BPF timestamping, a process may confuse a third party admin, so
> > the situation is slightly different.
>
> Agreed.
>
> >
> > > > IIUC, I will not consider this
> > > > unlikely case, then the UDP case is quite similar to the TCP case.
> > > >
> > > > The scenario for the UDP case is:
> > > > 1) using fentry bpf to hook the udp_sendmsg() to get the timestamp
> > > > like TCP does in this series.
> > > > 2) insert BPF_SOCK_OPS_TS_SND_CB into __ip_append_data() near the
> > > > SO_TIMESTAMPING code snippets to let bpf program correlate the tskey
> > > > with timestamp.
> > > > Note: tskey in UDP will be handled carefully in a different way
> > > > because we should support both modes for socket timestamping at the
> > > > same time.
> > > > It's really similar to TCP regardless of handling tskey.
> > > >
> > >
> > > To be more precise in case you don't have much time to read the above
> > > long paragraph, BPF_SOCK_OPS_TS_SND_CB is mainly used to correlate
> > > sendmsg timestamp with corresponding tskey.
> > >
> > > 1. For TCP, we can correlate it in tcp_tx_timestamp() like this patch does.
> > >
> > > 2. For UDP, we can correlate in __ip_append_data() along with those
> > > tskey initialization, assuming there are no multiple threads calling
> > > locklessly ip_make_skb(). Locked path
> > > (udp_sendmsg()->ip_append_data()) works like TCP under the socket lock
> > > protection, so it can be easily handled. Lockless path
> > > (udp_sendmsg()->ip_make_skb()) can be visited by multiple threads at
> > > the same time, which should be handled properly.
> >
> > Different hook points is fine, as UDP (and RAW) uses __ip_append_data
>
> Then this approach (introducing this new flag) is feasible. Sorry that
> last night I wrote such a long paragraph which buried something
> important. Because of that, I rephrase the whole idea about how to let
> UDP work with this kind of new flag in [patch v8 11/12]. Link is
> https://lore.kernel.org/all/CAL+tcoCmXcDot-855XYU7PKCiGvJL=O3CQBGuOTRAs2_=Ys=gg@mail.gmail.com/
>
> > or more importantly ip_send_skb, while TCP uses ip_queue_xmit.
>
> For TCP, we use tcp_tx_timestamp to finish the map between sendmsg
> timestamp and tskey.
>
> >
> > As long as the API is the same: the operation (BPF_SOCK_OPS_TS_SND_CB)
> > and the behavior of that operation. Subject to the usual distinction
> > between protocol behavior (bytestream vs datagram).
>
> I see your point.
>
> >
> > > I prefer to implement
> > > the bpf extension for IPCORK_TS_OPT_ID, which should be another topic,
> > > I think. This might be the only one corner case, IIUC?
> >
> > This sounds like an entirely different topic? Not sure what this is.
>
> Not really a different topic. I mean let bpf prog take the whole
> control of setting the tskey, then with this BPF_SOCK_OPS_TS_SND_CB
> flag we can correlate the sendmsg timestamp with tskey. So It has
> something to do with the usage of UDP. Please take a look at that link
> to patch 11/12. For TCP, we don't need to care about the value of
> tskey which has already been taken care of by SO_TIMESTAMPING. So it
> is slightly different. I'm not sure if this kind of usage is
> acceptable?
Why can TCP rely on SO_TIMESTAMPING to set tskey, but UDP cannot?
BPF will need to set the key for both protocol if SO_TIMESTAMPING is
not enabled, right?
Powered by blists - more mailing lists