[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <66db94be6e209_2a33ef294e@willemb.c.googlers.com.notmuch>
Date: Fri, 06 Sep 2024 19:48:14 -0400
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Vadim Fedorenko <vadim.fedorenko@...ux.dev>,
Willem de Bruijn <willemdebruijn.kernel@...il.com>,
Willem de Bruijn <willemb@...gle.com>
Cc: netdev@...r.kernel.org,
David Ahern <dsahern@...nel.org>,
Jason Xing <kerneljasonxing@...il.com>,
Jakub Kicinski <kuba@...nel.org>,
Simon Horman <horms@...nel.org>,
Paolo Abeni <pabeni@...hat.com>
Subject: Re: [PATCH net-next v3 2/4] net_tstamp: add SCM_TS_OPT_ID for TCP
sockets
Vadim Fedorenko wrote:
> On 06/09/2024 16:25, Willem de Bruijn wrote:
> > Vadim Fedorenko wrote:
> >> On 05/09/2024 17:39, Willem de Bruijn wrote:
> >>> Vadim Fedorenko wrote:
> >>>> On 04/09/2024 12:31, Vadim Fedorenko wrote:
> >>>>> TCP sockets have different flow for providing timestamp OPT_ID value.
> >>>>> Adjust the code to support SCM_TS_OPT_ID option for TCP sockets.
> >>>>>
> >>>>> Signed-off-by: Vadim Fedorenko <vadfed@...a.com>
> >>>>> ---
> >>>>> net/ipv4/tcp.c | 13 +++++++++----
> >>>>> 1 file changed, 9 insertions(+), 4 deletions(-)
> >>>>>
> >>>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> >>>>> index 8a5680b4e786..5553a8aeee80 100644
> >>>>> --- a/net/ipv4/tcp.c
> >>>>> +++ b/net/ipv4/tcp.c
> >>>>> @@ -474,9 +474,10 @@ void tcp_init_sock(struct sock *sk)
> >>>>> }
> >>>>> EXPORT_SYMBOL(tcp_init_sock);
> >>>>>
> >>>>> -static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
> >>>>> +static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
> >>>>> {
> >>>>> struct sk_buff *skb = tcp_write_queue_tail(sk);
> >>>>> + u32 tsflags = sockc->tsflags;
> >>>>>
> >>>>> if (tsflags && skb) {
> >>>>> struct skb_shared_info *shinfo = skb_shinfo(skb);
> >>>>> @@ -485,8 +486,12 @@ static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
> >>>>> sock_tx_timestamp(sk, tsflags, &shinfo->tx_flags);
> >>>>> if (tsflags & SOF_TIMESTAMPING_TX_ACK)
> >>>>> tcb->txstamp_ack = 1;
> >>>>> - if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK)
> >>>>> - shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> >>>>> + if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
> >>>>> + if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
> >>>>> + shinfo->tskey = sockc->ts_opt_id;
> >>>>> + else
> >>>>> + shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> >>>>> + }
> >>>>> }
> >>>>> }
> >>>>>
> >>>>> @@ -1318,7 +1323,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
> >>>>>
> >>>>> out:
> >>>>> if (copied) {
> >>>>> - tcp_tx_timestamp(sk, sockc.tsflags);
> >>>>> + tcp_tx_timestamp(sk, &sockc);
> >>>>> tcp_push(sk, flags, mss_now, tp->nonagle, size_goal);
> >>>>> }
> >>>>> out_nopush:
> >>>>
> >>>> Hi Willem,
> >>>>
> >>>> Unfortunately, these changes are not enough to enable custom OPT_ID for
> >>>> TCP sockets. There are some functions which rewrite shinfo->tskey in TCP
> >>>> flow:
> >>>>
> >>>> tcp_skb_collapse_tstamp()
> >>>> tcp_fragment_tstamp()
> >>>> tcp_gso_tstamp()
> >>>>
> >>>> I believe the last one breaks tests, but the problem is that there is no
> >>>> easy way to provide the flag of constant tskey to it. Only
> >>>> shinfo::tx_flags are available at the caller side and we have already
> >>>> discussed that we shouldn't use the last bit of this field.
> >>>>
> >>>> So, how should we deal with the problem? Or is it better to postpone
> >>>> support for TCP sockets in this case?
> >>>
> >>> Are you sure that this is a problem. These functions pass on the
> >>> skb_shinfo(skb)->ts_key from one skb to another.
> >>
> >> Yes, you are right, the problem is in a different place.
> >>
> >> __skb_complete_tx_timestamp receives skb with shinfo->tskey equal to
> >> provided by cmsg. But for TCP sockets it unconditionally adjusts ee_data
> >> value:
> >>
> >> if (sk_is_tcp(sk))
> >> serr->ee.ee_data -= atomic_read(&sk->sk_tskey);
> >>
> >> It happens because of assumption that for TCP sockets shinfo::tskey will
> >> have sequence number and the logic has to recalculate it back to the
> >> bytes sent. The same logic exists in all TCP TX timestamping functions
> >> (mentioned in the previous mail) and may trigger some unexpected
> >> behavior. To fix the issue we have to provide some kind of signal that
> >> tskey value is provided from user-space and shouldn't be changed. And we
> >> have to have it somewhere in skb info. Again, tx_flags looks like the
> >> best candidate, but it's impossible to use. I'm thinking of using
> >> special flag in tcp_skb_cb - gonna test more, but open for other
> >> suggestions.
> >
> > Ai, that is tricky. tx_flags is full/scarce indeed.
> >
> > CB does not persist as the skb transitions between layers.
> >
> > The most obvious solution would be to set the flag in sk_tsflags
> > itself. But then the cmsg would no long work on a per request basis:
> > either the socket uses OPT_ID with counter or OPT_ID_CMSG.
> >
> > Good that we catch this now before the ABI is finalized.
> >
> > If necessary TCP semantics can diverge from datagrams. So we could
> > punt on this. But it's not ideal.
>
> I have done proof of concept code which uses hwtsamp as a storage for
> custom tskey in case of TCP:
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index a52638363ea5..40ed49e61bf7 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -5414,8 +5414,6 @@ static void __skb_complete_tx_timestamp(struct
> sk_buff *skb,
> serr->header.h4.iif = skb->dev ? skb->dev->ifindex : 0;
> if (READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID) {
> serr->ee.ee_data = skb_shinfo(skb)->tskey;
> - if (sk_is_tcp(sk))
> - serr->ee.ee_data -= atomic_read(&sk->sk_tskey);
> }
>
> err = sock_queue_err_skb(sk, skb);
> @@ -5450,6 +5448,8 @@ void skb_complete_tx_timestamp(struct sk_buff *skb,
> * but only if the socket refcount is not zero.
> */
> if (likely(refcount_inc_not_zero(&sk->sk_refcnt))) {
> + if (sk_is_tcp(sk) && (READ_ONCE(sk->sk_tsflags) &
> SOF_TIMESTAMPING_OPT_ID) && skb_hwtstamps(skb)->hwtstamp)
> + skb_shinfo(skb)->tskey =
> (u32)skb_hwtstamps(skb)->hwtstamp;
> *skb_hwtstamps(skb) = *hwtstamps;
> __skb_complete_tx_timestamp(skb, sk, SCM_TSTAMP_SND,
> false);
> sock_put(sk);
> @@ -5509,6 +5509,12 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
> skb_shinfo(skb)->tskey = skb_shinfo(orig_skb)->tskey;
> }
>
> + if (sk_is_tcp(sk) && (tsflags & SOF_TIMESTAMPING_OPT_ID)) {
> + if (skb_hwtstamps(orig_skb)->hwtstamp)
> + skb_shinfo(skb)->tskey =
> (u32)skb_hwtstamps(orig_skb)->hwtstamp;
> + else
> + skb_shinfo(skb)->tskey -=
> atomic_read(&sk->sk_tskey);
> + }
> if (hwtstamps)
> *skb_hwtstamps(skb) = *hwtstamps;
> else
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 0d3decc13a99..1a161a2155b5 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -488,9 +488,8 @@ static void tcp_tx_timestamp(struct sock *sk, struct
> sockcm_cookie *sockc)
> tcb->txstamp_ack = 1;
> if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
> if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
> - shinfo->tskey = sockc->ts_opt_id;
> - else
> - shinfo->tskey = TCP_SKB_CB(skb)->seq +
> skb->len - 1;
> + skb_hwtstamps(skb)->hwtstamp =
> sockc->ts_opt_id;
> + shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> }
> }
> }
>
>
> Looks like we can add u32 tskey field in skb_shared_hwtstamps and reuse
> it. netdev_data field is only used on RX timestamp side, so should be
> fine to reuse. WDYT?
We cannot really extend the struct. skb_shared_info is scarce.
hwtstamps is a union. But on tx the hw tstamp is queued using
skb_tstamp_tx, not through this shinfo data at all.
It just seems weird to have a shinfo->tskey, but then ignore it and
find yet another 32b field. Easier would be to find 1b to toggle
whether tskey is to be interpreted as counter based or OPT_ID_CMSG.
I don't immediate see a perfect solution either.
Powered by blists - more mailing lists