[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2ddaaf3e-a63c-4e0b-811e-568c1cedb4b7@linux.dev>
Date: Sat, 7 Sep 2024 22:55:11 +0100
From: Vadim Fedorenko <vadim.fedorenko@...ux.dev>
To: Willem de Bruijn <willemdebruijn.kernel@...il.com>,
Willem de Bruijn <willemb@...gle.com>
Cc: netdev@...r.kernel.org, David Ahern <dsahern@...nel.org>,
Jason Xing <kerneljasonxing@...il.com>, Jakub Kicinski <kuba@...nel.org>,
Simon Horman <horms@...nel.org>, Paolo Abeni <pabeni@...hat.com>
Subject: Re: [PATCH net-next v3 2/4] net_tstamp: add SCM_TS_OPT_ID for TCP
sockets
On 07/09/2024 00:48, Willem de Bruijn wrote:
> Vadim Fedorenko wrote:
>> On 06/09/2024 16:25, Willem de Bruijn wrote:
>>> Vadim Fedorenko wrote:
>>>> On 05/09/2024 17:39, Willem de Bruijn wrote:
>>>>> Vadim Fedorenko wrote:
>>>>>> On 04/09/2024 12:31, Vadim Fedorenko wrote:
>>>>>>> TCP sockets have different flow for providing timestamp OPT_ID value.
>>>>>>> Adjust the code to support SCM_TS_OPT_ID option for TCP sockets.
>>>>>>>
>>>>>>> Signed-off-by: Vadim Fedorenko <vadfed@...a.com>
>>>>>>> ---
>>>>>>> net/ipv4/tcp.c | 13 +++++++++----
>>>>>>> 1 file changed, 9 insertions(+), 4 deletions(-)
>>>>>>>
>>>>>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>>>>>>> index 8a5680b4e786..5553a8aeee80 100644
>>>>>>> --- a/net/ipv4/tcp.c
>>>>>>> +++ b/net/ipv4/tcp.c
>>>>>>> @@ -474,9 +474,10 @@ void tcp_init_sock(struct sock *sk)
>>>>>>> }
>>>>>>> EXPORT_SYMBOL(tcp_init_sock);
>>>>>>>
>>>>>>> -static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
>>>>>>> +static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
>>>>>>> {
>>>>>>> struct sk_buff *skb = tcp_write_queue_tail(sk);
>>>>>>> + u32 tsflags = sockc->tsflags;
>>>>>>>
>>>>>>> if (tsflags && skb) {
>>>>>>> struct skb_shared_info *shinfo = skb_shinfo(skb);
>>>>>>> @@ -485,8 +486,12 @@ static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
>>>>>>> sock_tx_timestamp(sk, tsflags, &shinfo->tx_flags);
>>>>>>> if (tsflags & SOF_TIMESTAMPING_TX_ACK)
>>>>>>> tcb->txstamp_ack = 1;
>>>>>>> - if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK)
>>>>>>> - shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
>>>>>>> + if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
>>>>>>> + if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
>>>>>>> + shinfo->tskey = sockc->ts_opt_id;
>>>>>>> + else
>>>>>>> + shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
>>>>>>> + }
>>>>>>> }
>>>>>>> }
>>>>>>>
>>>>>>> @@ -1318,7 +1323,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
>>>>>>>
>>>>>>> out:
>>>>>>> if (copied) {
>>>>>>> - tcp_tx_timestamp(sk, sockc.tsflags);
>>>>>>> + tcp_tx_timestamp(sk, &sockc);
>>>>>>> tcp_push(sk, flags, mss_now, tp->nonagle, size_goal);
>>>>>>> }
>>>>>>> out_nopush:
>>>>>>
>>>>>> Hi Willem,
>>>>>>
>>>>>> Unfortunately, these changes are not enough to enable custom OPT_ID for
>>>>>> TCP sockets. There are some functions which rewrite shinfo->tskey in TCP
>>>>>> flow:
>>>>>>
>>>>>> tcp_skb_collapse_tstamp()
>>>>>> tcp_fragment_tstamp()
>>>>>> tcp_gso_tstamp()
>>>>>>
>>>>>> I believe the last one breaks tests, but the problem is that there is no
>>>>>> easy way to provide the flag of constant tskey to it. Only
>>>>>> shinfo::tx_flags are available at the caller side and we have already
>>>>>> discussed that we shouldn't use the last bit of this field.
>>>>>>
>>>>>> So, how should we deal with the problem? Or is it better to postpone
>>>>>> support for TCP sockets in this case?
>>>>>
>>>>> Are you sure that this is a problem. These functions pass on the
>>>>> skb_shinfo(skb)->ts_key from one skb to another.
>>>>
>>>> Yes, you are right, the problem is in a different place.
>>>>
>>>> __skb_complete_tx_timestamp receives skb with shinfo->tskey equal to
>>>> provided by cmsg. But for TCP sockets it unconditionally adjusts ee_data
>>>> value:
>>>>
>>>> if (sk_is_tcp(sk))
>>>> serr->ee.ee_data -= atomic_read(&sk->sk_tskey);
>>>>
>>>> It happens because of assumption that for TCP sockets shinfo::tskey will
>>>> have sequence number and the logic has to recalculate it back to the
>>>> bytes sent. The same logic exists in all TCP TX timestamping functions
>>>> (mentioned in the previous mail) and may trigger some unexpected
>>>> behavior. To fix the issue we have to provide some kind of signal that
>>>> tskey value is provided from user-space and shouldn't be changed. And we
>>>> have to have it somewhere in skb info. Again, tx_flags looks like the
>>>> best candidate, but it's impossible to use. I'm thinking of using
>>>> special flag in tcp_skb_cb - gonna test more, but open for other
>>>> suggestions.
>>>
>>> Ai, that is tricky. tx_flags is full/scarce indeed.
>>>
>>> CB does not persist as the skb transitions between layers.
>>>
>>> The most obvious solution would be to set the flag in sk_tsflags
>>> itself. But then the cmsg would no long work on a per request basis:
>>> either the socket uses OPT_ID with counter or OPT_ID_CMSG.
>>>
>>> Good that we catch this now before the ABI is finalized.
>>>
>>> If necessary TCP semantics can diverge from datagrams. So we could
>>> punt on this. But it's not ideal.
>>
>> I have done proof of concept code which uses hwtsamp as a storage for
>> custom tskey in case of TCP:
>>
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index a52638363ea5..40ed49e61bf7 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -5414,8 +5414,6 @@ static void __skb_complete_tx_timestamp(struct
>> sk_buff *skb,
>> serr->header.h4.iif = skb->dev ? skb->dev->ifindex : 0;
>> if (READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID) {
>> serr->ee.ee_data = skb_shinfo(skb)->tskey;
>> - if (sk_is_tcp(sk))
>> - serr->ee.ee_data -= atomic_read(&sk->sk_tskey);
>> }
>>
>> err = sock_queue_err_skb(sk, skb);
>> @@ -5450,6 +5448,8 @@ void skb_complete_tx_timestamp(struct sk_buff *skb,
>> * but only if the socket refcount is not zero.
>> */
>> if (likely(refcount_inc_not_zero(&sk->sk_refcnt))) {
>> + if (sk_is_tcp(sk) && (READ_ONCE(sk->sk_tsflags) &
>> SOF_TIMESTAMPING_OPT_ID) && skb_hwtstamps(skb)->hwtstamp)
>> + skb_shinfo(skb)->tskey =
>> (u32)skb_hwtstamps(skb)->hwtstamp;
>> *skb_hwtstamps(skb) = *hwtstamps;
>> __skb_complete_tx_timestamp(skb, sk, SCM_TSTAMP_SND,
>> false);
>> sock_put(sk);
>> @@ -5509,6 +5509,12 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
>> skb_shinfo(skb)->tskey = skb_shinfo(orig_skb)->tskey;
>> }
>>
>> + if (sk_is_tcp(sk) && (tsflags & SOF_TIMESTAMPING_OPT_ID)) {
>> + if (skb_hwtstamps(orig_skb)->hwtstamp)
>> + skb_shinfo(skb)->tskey =
>> (u32)skb_hwtstamps(orig_skb)->hwtstamp;
>> + else
>> + skb_shinfo(skb)->tskey -=
>> atomic_read(&sk->sk_tskey);
>> + }
>> if (hwtstamps)
>> *skb_hwtstamps(skb) = *hwtstamps;
>> else
>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
>> index 0d3decc13a99..1a161a2155b5 100644
>> --- a/net/ipv4/tcp.c
>> +++ b/net/ipv4/tcp.c
>> @@ -488,9 +488,8 @@ static void tcp_tx_timestamp(struct sock *sk, struct
>> sockcm_cookie *sockc)
>> tcb->txstamp_ack = 1;
>> if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
>> if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
>> - shinfo->tskey = sockc->ts_opt_id;
>> - else
>> - shinfo->tskey = TCP_SKB_CB(skb)->seq +
>> skb->len - 1;
>> + skb_hwtstamps(skb)->hwtstamp =
>> sockc->ts_opt_id;
>> + shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
>> }
>> }
>> }
>>
>>
>> Looks like we can add u32 tskey field in skb_shared_hwtstamps and reuse
>> it. netdev_data field is only used on RX timestamp side, so should be
>> fine to reuse. WDYT?
>
> We cannot really extend the struct. skb_shared_info is scarce.
> hwtstamps is a union. But on tx the hw tstamp is queued using
> skb_tstamp_tx, not through this shinfo data at all.
>
> It just seems weird to have a shinfo->tskey, but then ignore it and
> find yet another 32b field. Easier would be to find 1b to toggle
> whether tskey is to be interpreted as counter based or OPT_ID_CMSG.
>
> I don't immediate see a perfect solution either.
Well, with this said, and given that having 2 different tskey values is
weird solution, I'm thinking of skipping TCP part until we can find some
proper way. Will it be OK to have UDP and RAW sockets only opted into
the feature?
Powered by blists - more mailing lists