lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <65e2225cb21be_158220294b@willemb.c.googlers.com.notmuch>
Date: Fri, 01 Mar 2024 13:45:48 -0500
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Abhishek Chauhan <quic_abchauha@...cinc.com>, 
 "David S. Miller" <davem@...emloft.net>, 
 Eric Dumazet <edumazet@...gle.com>, 
 Jakub Kicinski <kuba@...nel.org>, 
 Paolo Abeni <pabeni@...hat.com>, 
 netdev@...r.kernel.org, 
 linux-kernel@...r.kernel.org, 
 Andrew Halaney <ahalaney@...hat.com>, 
 Willem de Bruijn <willemdebruijn.kernel@...il.com>, 
 Martin KaFai Lau <martin.lau@...nel.org>, 
 Martin KaFai Lau <martin.lau@...ux.dev>
Cc: kernel@...cinc.com
Subject: Re: [PATCH net-next v3] net: Re-use and set mono_delivery_time bit
 for userspace tstamp packets

Abhishek Chauhan wrote:
> Bridge driver today has no support to forward the userspace timestamp
> packets and ends up resetting the timestamp. ETF qdisc checks the
> packet coming from userspace and encounters to be 0 thereby dropping
> time sensitive packets. These changes will allow userspace timestamps
> packets to be forwarded from the bridge to NIC drivers.
> 
> Setting the same bit (mono_delivery_time) to avoid dropping of
> userspace tstamp packets in the forwarding path.
> 
> Existing functionality of mono_delivery_time remains unaltered here,
> instead just extended with userspace tstamp support for bridge
> forwarding path.
> 
> Signed-off-by: Abhishek Chauhan <quic_abchauha@...cinc.com>
> ---
> Changes since v2
> - Updated the commit subject and message. 
> - Took care of few comments from Willem to re-use mono_delivery_time
>   with comments and documentations in the header and source file.
> - Took care of comment from Andrew on the typo in the comment.
> - Existing self-test test cases are executed to make sure existing 
>   implementation is not impacted as stated by Paolo.(so_txtime.sh). 
> - Internal validation of UDP packets using iperf/so_priority/so_txtime
>   with MQPRIO + ETF offload is executed as well.
> - Test case is included below
> 
> Test 1 :- FQ + ETF (SW path)
> 
> [root@...ldauto-lvarm04-lnx ~]# ./so_txtime.sh
> [  280.640551] q->last time is 1707955476143297550
> [  283.338947] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
> [  284.078429] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
> 
> SO_TXTIME ipv4 clock monotonic
> payload:a delay:109 expected:0 (us)
> 
> SO_TXTIME ipv6 clock monotonic
> payload:a delay:140 expected:0 (us)
> 
> SO_TXTIME ipv6 clock monotonic
> payload:a delay:12739 expected:10000 (us)
> 
> SO_TXTIME ipv4 clock monotonic
> payload:a delay:10054 expected:10000 (us)
> payload:b delay:20043 expected:20000 (us)
> 
> SO_TXTIME ipv6 clock monotonic
> payload:b delay:20078 expected:20000 (us)
> payload:a delay:20177 expected:20000 (us)
> 
> SO_TXTIME ipv4 clock tai
> send: pkt a at -1707955482913ms dropped: invalid txtime
> [  287.070504] now is set to 1707955482913404839
> [  287.070509] tx time from SKB is 0
> ./so_txtime: recv: timeout: Resource temporarily unavailable
> 
> SO_TXTIME ipv6 clock tai
> send: pkt a at 0ms dropped: invalid txtime
> [  287.070510] q->last time is 0
> [  287.420590] now is set to 1707955483263491298
> [  287.420596] tx time from SKB is 1707955483263454527
> ./so_txtime: recv: timeout: Resource temporarily unavailable
> 
> SO_TXTIME ipv6 clock tai
> [  287.420597] q->last time is 0
> [  287.700598] now is set to 1707955483543498954
> [  287.700604] tx time from SKB is 1707955483553463173
> payload:a delay:9655 expected:10000 (us)
> 
> SO_TXTIME ipv4 clock tai
> [  287.700605] q->last time is 0
> [  288.100532] now is set to 1707955483943432391
> [  288.100537] tx time from SKB is 1707955483953413016
> payload:a delay:9668 expected:10000 (us)[  288.100538] q->last time is 1707955483553463173
> 
> [  288.100546] now is set to 1707955483943446975
> [  288.100547] tx time from SKB is 1707955483963413016
> payload:b delay:20484 expected:20000 (us)
> 
> SO_TXTIME ipv6 clock tai
> [  288.100547] q->last time is 1707955483553463173
> [  288.440582] now is set to 1707955484283482495
> [  288.440587] tx time from SKB is 1707955484303452808
> payload:b delay:9648 expected:10000 (us)[  288.440588] q->last time is 1707955483963413016
> 
> [  288.440598] now is set to 1707955484283499370
> payload:a delay:22037 expected:20000 (us)
> [  288.440599] tx time from SKB is 1707955484293452808
> OK. All tests passed
> 
> 
> Test case 2 (MQPRIO + ETF HW offload)
> 
> [root@...ldauto-lvarm04-lnx ~]# tc qdisc add dev eth0 handle 100: parent root mqprio num_tc 4 \
>             map 0 2 1 3 3 2 2 2 2 2 2 2 2 2 2 2 \
>             queues 1@0 1@1 1@2 1@3\
>             hw 0
> [root@...ldauto-lvarm04-lnx ~]#
> tc qdisc replace dev eth0 parent 100:4 etf \
>             clockid CLOCK_TAI delta 40000  offload skip_sock_check
> [   89.145838] qcom-ethqos 23040000.ethernet eth0: enabled ETF for Queue test log 3, number of queues 4, qopt enable 1, tbs queue bit 1
> [   89.145846] qcom-ethqos 23040000.ethernet eth0: enabled ETF for Queue 3
> 
> 
> [root@...ldauto-lvarm04-lnx ~]# ./a.out -4 -c tai -S 192.168.1.1 -D 192.168.1.2 a,1,b,2
> 
> SO_TXTIME ipv4 clock tai
> 
>  glob_tstat = 1707955395256170394
> [  199.623650] now is set to 1707955395256215810
> [  199.623655] tx time from SKB is 1707955395257170394
> [  199.623656] q->last time is 0
> [  199.623663] now is set to 1707955395256230029
> [  199.623664] tx time from SKB is 1707955395258170394
> [  199.623665] q->last time is 0
> [  199.624589] qcom-ethqos 23040000.ethernet eth0: emac ethqos tx_xmit : lauching tbs packet at 1707955395 sec and 257170394 nsec
> [  199.625573] qcom-ethqos 23040000.ethernet eth0: emac ethqos tx_xmit : lauching tbs packet at 1707955395 sec and 258170394 nsec
> 
> Changes since v1 
> - Changed the commit subject as i am modifying the mono_delivery_time 
>   bit with clockid_delivery_time.
> - Took care of suggestion mentioned by Willem to use the same bit for 
>   userspace delivery time as there are no conflicts between TCP and 
>   SCM_TXTIME, because explicit cmsg makes no sense for TCP and only
>   RAW and DGRAM sockets interprets it. 
> - Clear explaination of why this is needed mentioned below and this 
>   is extending the work done by Martin for mono_delivery_time 
>   https://patchwork.kernel.org/project/netdevbpf/patch/20220302195525.3480280-1-kafai@fb.com/
> - Version 1 patch can be referenced with below link which states 
>   the exact problem with tc-etf and discussions which took place
>   https://lore.kernel.org/all/20240215215632.2899370-1-quic_abchauha@quicinc.com/
> 
>  include/linux/skbuff.h | 4 ++++
>  net/ipv4/ip_output.c   | 7 +++++++
>  net/ipv4/raw.c         | 7 +++++++
>  net/ipv6/ip6_output.c  | 8 +++++++-
>  net/ipv6/raw.c         | 8 +++++++-
>  net/packet/af_packet.c | 8 +++++++-
>  6 files changed, 39 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 2dde34c29203..58586d56b19f 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -820,6 +820,10 @@ typedef unsigned char *sk_buff_data_t;
>   *		delivery_time in mono clock base (i.e. EDT).  Otherwise, the
>   *		skb->tstamp has the (rcv) timestamp at ingress and
>   *		delivery_time at egress.
> + *		This bit is also set for tstamp coming from userspace which
> + *		acts as an information in the bridge forwarding path to avoid
> + *		resetting the tstamp value when user sets the timestamp using
> + *		SO_TXTIME sockopts.

There are multiple applications of this information aside from
bridging. I'd drop that and instead rewrite the existing. Something
like

"delivery_time in mono clock base (i.e., EDT) or a clock base chosen
by SO_TXTIME. If zero, skb->tstamp has the (rcv) timestamp at
ingress."

>   *	@napi_id: id of the NAPI struct this skb came from
>   *	@sender_cpu: (aka @napi_id) source CPU in XPS
>   *	@alloc_cpu: CPU which did the skb allocation.
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index 5b5a0adb927f..4ae6aea8f8d6 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -1455,6 +1455,13 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
>  	skb->priority = (cork->tos != -1) ? cork->priority: READ_ONCE(sk->sk_priority);
>  	skb->mark = cork->mark;
>  	skb->tstamp = cork->transmit_time;
> +	/* Timestamp coming from userspace using CMSG is stored as part
> +	 * of transmit_time as part of cork. To ensure bridge does not
> +	 * drop the tstamp in the forwarding path.We are reusing bit
> +	 * mono_delivery_time to avoid reset of tstamp in bridge
> +	 * forwarding path.
> +	 */
> +	skb->mono_delivery_time = !!skb->tstamp;

This patch adds too much verbose commentary, repeated multiple times,
for such a small change. Keep only the comment in skbuff.h.

>  	/*
>  	 * Steal rt from cork.dst to avoid a pair of atomic_inc/atomic_dec
>  	 * on dst refcount
> diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
> index aea89326c697..6e67c0203be8 100644
> --- a/net/ipv4/raw.c
> +++ b/net/ipv4/raw.c
> @@ -353,6 +353,13 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4,
>  	skb->priority = READ_ONCE(sk->sk_priority);
>  	skb->mark = sockc->mark;
>  	skb->tstamp = sockc->transmit_time;
> +	/* Timestamp coming from userspace using CMSG is stored as part
> +	 * of transmit_time as part of sockcmcookie. To ensure bridge does not
> +	 * drop the tstamp in the forwarding path. We are reusing bit
> +	 * mono_delivery_time to avoid reset of tstamp in bridge
> +	 * forwarding path.
> +	 */
> +	skb->mono_delivery_time = !!skb->tstamp;
>  	skb_dst_set(skb, &rt->dst);
>  	*rtp = NULL;
>  
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index a722a43dd668..f5b5e13a920f 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -1922,7 +1922,13 @@ struct sk_buff *__ip6_make_skb(struct sock *sk,
>  	skb->priority = READ_ONCE(sk->sk_priority);
>  	skb->mark = cork->base.mark;
>  	skb->tstamp = cork->base.transmit_time;
> -
> +	/* Timestamp coming from userspace using CMSG is stored as part
> +	 * of transmit_time as part of cork. To ensure bridge does not
> +	 * drop the tstamp in the forwarding path. We are reusing bit
> +	 * mono_delivery_time to avoid reset of tstamp in bridge
> +	 * forwarding path.
> +	 */
> +	skb->mono_delivery_time = !!skb->tstamp;
>  	ip6_cork_steal_dst(skb, cork);
>  	IP6_INC_STATS(net, rt->rt6i_idev, IPSTATS_MIB_OUTREQUESTS);
>  	if (proto == IPPROTO_ICMPV6) {
> diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
> index 03dbb874c363..d2e2a1ec3de4 100644
> --- a/net/ipv6/raw.c
> +++ b/net/ipv6/raw.c
> @@ -616,7 +616,13 @@ static int rawv6_send_hdrinc(struct sock *sk, struct msghdr *msg, int length,
>  	skb->priority = READ_ONCE(sk->sk_priority);
>  	skb->mark = sockc->mark;
>  	skb->tstamp = sockc->transmit_time;
> -
> +	/* Timestamp coming from userspace using CMSG is stored as part
> +	 * of transmit_time as part of sockcmcookie. To ensure bridge does not
> +	 * drop the tstamp in the forwarding path.We are reusing bit
> +	 * mono_delivery_time to avoid reset of tstamp in bridge
> +	 * forwarding path.
> +	 */
> +	skb->mono_delivery_time = !!skb->tstamp;
>  	skb_put(skb, length);
>  	skb_reset_network_header(skb);
>  	iph = ipv6_hdr(skb);
> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
> index c9bbc2686690..949e936b5786 100644
> --- a/net/packet/af_packet.c
> +++ b/net/packet/af_packet.c
> @@ -2057,7 +2057,13 @@ static int packet_sendmsg_spkt(struct socket *sock, struct msghdr *msg,
>  	skb->priority = READ_ONCE(sk->sk_priority);
>  	skb->mark = READ_ONCE(sk->sk_mark);
>  	skb->tstamp = sockc.transmit_time;
> -
> +	/* Timestamp coming from userspace using CMSG is stored as part
> +	 * of transmit_time as part of sockcmcookie. To ensure bridge does not
> +	 * drop the tstamp in the forwarding path. We are reusing bit
> +	 * mono_delivery_time to avoid reset of tstamp in bridge
> +	 * forwarding path.
> +	 */
> +	skb->mono_delivery_time = !!skb->tstamp;

Search for all occurrences of skb->tstamp getting initialized from
sockc.transmit_time. af_packet.c has three such cases.

>  	skb_setup_tx_timestamp(skb, sockc.tsflags);
>  
>  	if (unlikely(extra_len == 4))
> -- 
> 2.25.1
> 



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ