[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACSApvYNXtJOkCXv+rs_O_VKzqu8xuRKzK3sKVcA3GFpwU0qDg@mail.gmail.com>
Date: Mon, 5 Dec 2022 19:34:51 -0500
From: Soheil Hassas Yeganeh <soheil@...gle.com>
To: Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: netdev@...r.kernel.org, davem@...emloft.net, kuba@...nel.org,
edumazet@...gle.com, pabeni@...hat.com,
Willem de Bruijn <willemb@...gle.com>
Subject: Re: [PATCH net-next] net_tstamp: add SOF_TIMESTAMPING_OPT_ID_TCP
On Mon, Dec 5, 2022 at 6:09 PM Willem de Bruijn
<willemdebruijn.kernel@...il.com> wrote:
>
> From: Willem de Bruijn <willemb@...gle.com>
>
> Add an option to initialize SOF_TIMESTAMPING_OPT_ID for TCP from
> write_seq sockets instead of snd_una.
>
> Intuitively the contract is that the counter is zero after the
> setsockopt, so that the next write N results in a notification for
> last byte N - 1.
>
> On idle sockets snd_una == write_seq so this holds for both. But on
> sockets with data in transmission, snd_una depends on the ACK response
> from the peer. A process cannot learn this in a race free manner
> (ioctl SIOCOUTQ is one racy approach).
>
> write_seq is a better starting point because based on the seqno of
> data written by the process only.
>
> But the existing behavior may already be relied upon. So make the new
> behavior optional behind a flag.
>
> The new timestamp flag necessitates increasing sk_tsflags to 32 bits.
> Move the field in struct sock to avoid growing the socket (for some
> common CONFIG variants). The UAPI interface so_timestamping.flags is
> already int, so 32 bits wide.
>
> Reported-by: Jakub Kicinski <kuba@...nel.org>
> Signed-off-by: Willem de Bruijn <willemb@...gle.com>
Acked-by: Soheil Hassas Yeganeh <soheil@...gle.com>
Thanks for the fix!
> ---
>
> Alternative solutions are
>
> * make the change unconditionally: a one line change.
> * make the condition a (per netns) sysctl instead of flag
> * make SOF_TIMESTAMPING_OPT_ID_TCP not a modifier of, but alternative
> to SOF_TIMESTAMPING_OPT_ID. That requires also updating all existing
> code that now tests OPT_ID to test a new OPT_ID_MASK.
>
> Weighing the variants, this seemed the best option to me.
> ---
> Documentation/networking/timestamping.rst | 19 +++++++++++++++++++
> include/net/sock.h | 6 +++---
> include/uapi/linux/net_tstamp.h | 3 ++-
> net/core/sock.c | 9 ++++++++-
> net/ethtool/common.c | 1 +
> 5 files changed, 33 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/networking/timestamping.rst b/Documentation/networking/timestamping.rst
> index be4eb1242057..578f24731be5 100644
> --- a/Documentation/networking/timestamping.rst
> +++ b/Documentation/networking/timestamping.rst
> @@ -192,6 +192,25 @@ SOF_TIMESTAMPING_OPT_ID:
> among all possibly concurrently outstanding timestamp requests for
> that socket.
>
> +SOF_TIMESTAMPING_OPT_ID_TCP:
> + Pass this modifier along with SOF_TIMESTAMPING_OPT_ID for new TCP
> + timestamping applications. SOF_TIMESTAMPING_OPT_ID defines how the
> + counter increments for stream sockets, but its starting point is
> + not entirely trivial. This modifier option changes that point.
> +
> + A reasonable expectation is that the counter is reset to zero with
> + the system call, so that a subsequent write() of N bytes generates
> + a timestamp with counter N-1. SOF_TIMESTAMPING_OPT_ID_TCP
> + implements this behavior under all conditions.
> +
> + SOF_TIMESTAMPING_OPT_ID without modifier often reports the same,
> + especially when the socket option is set when no data is in
> + transmission. If data is being transmitted, it may be off by the
> + length of the output queue (SIOCOUTQ) due to being based on snd_una
> + rather than write_seq. That offset depends on factors outside of
> + process control, including network RTT and peer response time. The
> + difference is subtle and unlikely to be noticed when confiugred at
> + initial socket creation. But .._OPT_ID behavior is more predictable.
>
> SOF_TIMESTAMPING_OPT_CMSG:
> Support recv() cmsg for all timestamped packets. Control messages
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 6d207e7c4ad0..ecea3dcc2217 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -503,10 +503,10 @@ struct sock {
> #if BITS_PER_LONG==32
> seqlock_t sk_stamp_seq;
> #endif
> - u16 sk_tsflags;
> - u8 sk_shutdown;
> atomic_t sk_tskey;
> atomic_t sk_zckey;
> + u32 sk_tsflags;
> + u8 sk_shutdown;
>
> u8 sk_clockid;
> u8 sk_txtime_deadline_mode : 1,
> @@ -1899,7 +1899,7 @@ static inline void sock_replace_proto(struct sock *sk, struct proto *proto)
> struct sockcm_cookie {
> u64 transmit_time;
> u32 mark;
> - u16 tsflags;
> + u32 tsflags;
> };
>
> static inline void sockcm_init(struct sockcm_cookie *sockc,
> diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
> index 55501e5e7ac8..a2c66b3d7f0f 100644
> --- a/include/uapi/linux/net_tstamp.h
> +++ b/include/uapi/linux/net_tstamp.h
> @@ -31,8 +31,9 @@ enum {
> SOF_TIMESTAMPING_OPT_PKTINFO = (1<<13),
> SOF_TIMESTAMPING_OPT_TX_SWHW = (1<<14),
> SOF_TIMESTAMPING_BIND_PHC = (1 << 15),
> + SOF_TIMESTAMPING_OPT_ID_TCP = (1 << 16),
>
> - SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_BIND_PHC,
> + SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_ID_TCP,
> SOF_TIMESTAMPING_MASK = (SOF_TIMESTAMPING_LAST - 1) |
> SOF_TIMESTAMPING_LAST
> };
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 4571914a4aa8..b0ab841e0aed 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -901,13 +901,20 @@ int sock_set_timestamping(struct sock *sk, int optname,
> if (val & ~SOF_TIMESTAMPING_MASK)
> return -EINVAL;
>
> + if (val & SOF_TIMESTAMPING_OPT_ID_TCP &&
> + !(val & SOF_TIMESTAMPING_OPT_ID))
> + return -EINVAL;
> +
> if (val & SOF_TIMESTAMPING_OPT_ID &&
> !(sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID)) {
> if (sk_is_tcp(sk)) {
> if ((1 << sk->sk_state) &
> (TCPF_CLOSE | TCPF_LISTEN))
> return -EINVAL;
> - atomic_set(&sk->sk_tskey, tcp_sk(sk)->snd_una);
> + if (val & SOF_TIMESTAMPING_OPT_ID_TCP)
> + atomic_set(&sk->sk_tskey, tcp_sk(sk)->write_seq);
> + else
> + atomic_set(&sk->sk_tskey, tcp_sk(sk)->snd_una);
> } else {
> atomic_set(&sk->sk_tskey, 0);
> }
> diff --git a/net/ethtool/common.c b/net/ethtool/common.c
> index 21cfe8557205..6f399afc2ff2 100644
> --- a/net/ethtool/common.c
> +++ b/net/ethtool/common.c
> @@ -417,6 +417,7 @@ const char sof_timestamping_names[][ETH_GSTRING_LEN] = {
> [const_ilog2(SOF_TIMESTAMPING_OPT_PKTINFO)] = "option-pktinfo",
> [const_ilog2(SOF_TIMESTAMPING_OPT_TX_SWHW)] = "option-tx-swhw",
> [const_ilog2(SOF_TIMESTAMPING_BIND_PHC)] = "bind-phc",
> + [const_ilog2(SOF_TIMESTAMPING_OPT_ID_TCP)] = "option-id-tcp",
> };
> static_assert(ARRAY_SIZE(sof_timestamping_names) == __SOF_TIMESTAMPING_CNT);
>
> --
> 2.39.0.rc0.267.gcb52ba06e7-goog
>
Powered by blists - more mailing lists