lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <66ddf8df5ab85_2fb987294ec@willemb.c.googlers.com.notmuch>
Date: Sun, 08 Sep 2024 15:19:59 -0400
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Vadim Fedorenko <vadim.fedorenko@...ux.dev>, 
 Willem de Bruijn <willemdebruijn.kernel@...il.com>, 
 Willem de Bruijn <willemb@...gle.com>
Cc: netdev@...r.kernel.org, 
 David Ahern <dsahern@...nel.org>, 
 Jason Xing <kerneljasonxing@...il.com>, 
 Jakub Kicinski <kuba@...nel.org>, 
 Simon Horman <horms@...nel.org>, 
 Paolo Abeni <pabeni@...hat.com>
Subject: Re: [PATCH net-next v3 2/4] net_tstamp: add SCM_TS_OPT_ID for TCP
 sockets

Vadim Fedorenko wrote:
> On 07/09/2024 00:48, Willem de Bruijn wrote:
> > Vadim Fedorenko wrote:
> >> On 06/09/2024 16:25, Willem de Bruijn wrote:
> >>> Vadim Fedorenko wrote:
> >>>> On 05/09/2024 17:39, Willem de Bruijn wrote:
> >>>>> Vadim Fedorenko wrote:
> >>>>>> On 04/09/2024 12:31, Vadim Fedorenko wrote:
> >>>>>>> TCP sockets have different flow for providing timestamp OPT_ID value.
> >>>>>>> Adjust the code to support SCM_TS_OPT_ID option for TCP sockets.
> >>>>>>>
> >>>>>>> Signed-off-by: Vadim Fedorenko <vadfed@...a.com>
> >>>>>>> ---
> >>>>>>>      net/ipv4/tcp.c | 13 +++++++++----
> >>>>>>>      1 file changed, 9 insertions(+), 4 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> >>>>>>> index 8a5680b4e786..5553a8aeee80 100644
> >>>>>>> --- a/net/ipv4/tcp.c
> >>>>>>> +++ b/net/ipv4/tcp.c
> >>>>>>> @@ -474,9 +474,10 @@ void tcp_init_sock(struct sock *sk)
> >>>>>>>      }
> >>>>>>>      EXPORT_SYMBOL(tcp_init_sock);
> >>>>>>>      
> >>>>>>> -static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
> >>>>>>> +static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc)
> >>>>>>>      {
> >>>>>>>      	struct sk_buff *skb = tcp_write_queue_tail(sk);
> >>>>>>> +	u32 tsflags = sockc->tsflags;
> >>>>>>>      
> >>>>>>>      	if (tsflags && skb) {
> >>>>>>>      		struct skb_shared_info *shinfo = skb_shinfo(skb);
> >>>>>>> @@ -485,8 +486,12 @@ static void tcp_tx_timestamp(struct sock *sk, u16 tsflags)
> >>>>>>>      		sock_tx_timestamp(sk, tsflags, &shinfo->tx_flags);
> >>>>>>>      		if (tsflags & SOF_TIMESTAMPING_TX_ACK)
> >>>>>>>      			tcb->txstamp_ack = 1;
> >>>>>>> -		if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK)
> >>>>>>> -			shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> >>>>>>> +		if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
> >>>>>>> +			if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
> >>>>>>> +				shinfo->tskey = sockc->ts_opt_id;
> >>>>>>> +			else
> >>>>>>> +				shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> >>>>>>> +		}
> >>>>>>>      	}
> >>>>>>>      }
> >>>>>>>      
> >>>>>>> @@ -1318,7 +1323,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
> >>>>>>>      
> >>>>>>>      out:
> >>>>>>>      	if (copied) {
> >>>>>>> -		tcp_tx_timestamp(sk, sockc.tsflags);
> >>>>>>> +		tcp_tx_timestamp(sk, &sockc);
> >>>>>>>      		tcp_push(sk, flags, mss_now, tp->nonagle, size_goal);
> >>>>>>>      	}
> >>>>>>>      out_nopush:
> >>>>>>
> >>>>>> Hi Willem,
> >>>>>>
> >>>>>> Unfortunately, these changes are not enough to enable custom OPT_ID for
> >>>>>> TCP sockets. There are some functions which rewrite shinfo->tskey in TCP
> >>>>>> flow:
> >>>>>>
> >>>>>> tcp_skb_collapse_tstamp()
> >>>>>> tcp_fragment_tstamp()
> >>>>>> tcp_gso_tstamp()
> >>>>>>
> >>>>>> I believe the last one breaks tests, but the problem is that there is no
> >>>>>> easy way to provide the flag of constant tskey to it. Only
> >>>>>> shinfo::tx_flags are available at the caller side and we have already
> >>>>>> discussed that we shouldn't use the last bit of this field.
> >>>>>>
> >>>>>> So, how should we deal with the problem? Or is it better to postpone
> >>>>>> support for TCP sockets in this case?
> >>>>>
> >>>>> Are you sure that this is a problem. These functions pass on the
> >>>>> skb_shinfo(skb)->ts_key from one skb to another.
> >>>>
> >>>> Yes, you are right, the problem is in a different place.
> >>>>
> >>>> __skb_complete_tx_timestamp receives skb with shinfo->tskey equal to
> >>>> provided by cmsg. But for TCP sockets it unconditionally adjusts ee_data
> >>>> value:
> >>>>
> >>>> 	if (sk_is_tcp(sk))
> >>>> 		serr->ee.ee_data -= atomic_read(&sk->sk_tskey);
> >>>>
> >>>> It happens because of assumption that for TCP sockets shinfo::tskey will
> >>>> have sequence number and the logic has to recalculate it back to the
> >>>> bytes sent. The same logic exists in all TCP TX timestamping functions
> >>>> (mentioned in the previous mail) and may trigger some unexpected
> >>>> behavior. To fix the issue we have to provide some kind of signal that
> >>>> tskey value is provided from user-space and shouldn't be changed. And we
> >>>> have to have it somewhere in skb info. Again, tx_flags looks like the
> >>>> best candidate, but it's impossible to use. I'm thinking of using
> >>>> special flag in tcp_skb_cb - gonna test more, but open for other
> >>>> suggestions.
> >>>
> >>> Ai, that is tricky. tx_flags is full/scarce indeed.
> >>>
> >>> CB does not persist as the skb transitions between layers.
> >>>
> >>> The most obvious solution would be to set the flag in sk_tsflags
> >>> itself. But then the cmsg would no long work on a per request basis:
> >>> either the socket uses OPT_ID with counter or OPT_ID_CMSG.
> >>>
> >>> Good that we catch this now before the ABI is finalized.
> >>>
> >>> If necessary TCP semantics can diverge from datagrams. So we could
> >>> punt on this. But it's not ideal.
> >>
> >> I have done proof of concept code which uses hwtsamp as a storage for
> >> custom tskey in case of TCP:
> >>
> >> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> >> index a52638363ea5..40ed49e61bf7 100644
> >> --- a/net/core/skbuff.c
> >> +++ b/net/core/skbuff.c
> >> @@ -5414,8 +5414,6 @@ static void __skb_complete_tx_timestamp(struct
> >> sk_buff *skb,
> >>           serr->header.h4.iif = skb->dev ? skb->dev->ifindex : 0;
> >>           if (READ_ONCE(sk->sk_tsflags) & SOF_TIMESTAMPING_OPT_ID) {
> >>                   serr->ee.ee_data = skb_shinfo(skb)->tskey;
> >> -               if (sk_is_tcp(sk))
> >> -                       serr->ee.ee_data -= atomic_read(&sk->sk_tskey);
> >>           }
> >>
> >>           err = sock_queue_err_skb(sk, skb);
> >> @@ -5450,6 +5448,8 @@ void skb_complete_tx_timestamp(struct sk_buff *skb,
> >>            * but only if the socket refcount is not zero.
> >>            */
> >>           if (likely(refcount_inc_not_zero(&sk->sk_refcnt))) {
> >> +               if (sk_is_tcp(sk) && (READ_ONCE(sk->sk_tsflags) &
> >> SOF_TIMESTAMPING_OPT_ID) && skb_hwtstamps(skb)->hwtstamp)
> >> +                       skb_shinfo(skb)->tskey =
> >> (u32)skb_hwtstamps(skb)->hwtstamp;
> >>                   *skb_hwtstamps(skb) = *hwtstamps;
> >>                   __skb_complete_tx_timestamp(skb, sk, SCM_TSTAMP_SND,
> >> false);
> >>                   sock_put(sk);
> >> @@ -5509,6 +5509,12 @@ void __skb_tstamp_tx(struct sk_buff *orig_skb,
> >>                   skb_shinfo(skb)->tskey = skb_shinfo(orig_skb)->tskey;
> >>           }
> >>
> >> +       if (sk_is_tcp(sk) && (tsflags & SOF_TIMESTAMPING_OPT_ID)) {
> >> +               if (skb_hwtstamps(orig_skb)->hwtstamp)
> >> +                       skb_shinfo(skb)->tskey =
> >> (u32)skb_hwtstamps(orig_skb)->hwtstamp;
> >> +               else
> >> +                       skb_shinfo(skb)->tskey -=
> >> atomic_read(&sk->sk_tskey);
> >> +       }
> >>           if (hwtstamps)
> >>                   *skb_hwtstamps(skb) = *hwtstamps;
> >>           else
> >> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> >> index 0d3decc13a99..1a161a2155b5 100644
> >> --- a/net/ipv4/tcp.c
> >> +++ b/net/ipv4/tcp.c
> >> @@ -488,9 +488,8 @@ static void tcp_tx_timestamp(struct sock *sk, struct
> >> sockcm_cookie *sockc)
> >>                           tcb->txstamp_ack = 1;
> >>                   if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) {
> >>                           if (tsflags & SOCKCM_FLAG_TS_OPT_ID)
> >> -                               shinfo->tskey = sockc->ts_opt_id;
> >> -                       else
> >> -                               shinfo->tskey = TCP_SKB_CB(skb)->seq +
> >> skb->len - 1;
> >> +                               skb_hwtstamps(skb)->hwtstamp =
> >> sockc->ts_opt_id;
> >> +                       shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;
> >>                   }
> >>           }
> >>    }
> >>
> >>
> >> Looks like we can add u32 tskey field in skb_shared_hwtstamps and reuse
> >> it. netdev_data field is only used on RX timestamp side, so should be
> >> fine to reuse. WDYT?
> > 
> > We cannot really extend the struct. skb_shared_info is scarce.
> > hwtstamps is a union. But on tx the hw tstamp is queued using
> > skb_tstamp_tx, not through this shinfo data at all.
> > 
> > It just seems weird to have a shinfo->tskey, but then ignore it and
> > find yet another 32b field. Easier would be to find 1b to toggle
> > whether tskey is to be interpreted as counter based or OPT_ID_CMSG.
> > 
> > I don't immediate see a perfect solution either.
> 
> Well, with this said, and given that having 2 different tskey values is
> weird solution, I'm thinking of skipping TCP part until we can find some
> proper way. Will it be OK to have UDP and RAW sockets only opted into
> the feature?

Yes, I think that's fine.

Let's make sure that tcp sendmsg fails if the new cmsg is passed. So
that we can add support for it later.

We also have to give some thought what it means to coalesce skbs with
non-sequential IDs, in the functions you mentioned before. Might be
fine to just say: that's the application's issue. But even then we
should document that behavior.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ