lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACSApvYNXtJOkCXv+rs_O_VKzqu8xuRKzK3sKVcA3GFpwU0qDg@mail.gmail.com>
Date:   Mon, 5 Dec 2022 19:34:51 -0500
From:   Soheil Hassas Yeganeh <soheil@...gle.com>
To:     Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc:     netdev@...r.kernel.org, davem@...emloft.net, kuba@...nel.org,
        edumazet@...gle.com, pabeni@...hat.com,
        Willem de Bruijn <willemb@...gle.com>
Subject: Re: [PATCH net-next] net_tstamp: add SOF_TIMESTAMPING_OPT_ID_TCP

On Mon, Dec 5, 2022 at 6:09 PM Willem de Bruijn
<willemdebruijn.kernel@...il.com> wrote:
>
> From: Willem de Bruijn <willemb@...gle.com>
>
> Add an option to initialize SOF_TIMESTAMPING_OPT_ID for TCP from
> write_seq sockets instead of snd_una.
>
> Intuitively the contract is that the counter is zero after the
> setsockopt, so that the next write N results in a notification for
> last byte N - 1.
>
> On idle sockets snd_una == write_seq so this holds for both. But on
> sockets with data in transmission, snd_una depends on the ACK response
> from the peer. A process cannot learn this in a race free manner
> (ioctl SIOCOUTQ is one racy approach).
>
> write_seq is a better starting point because based on the seqno of
> data written by the process only.
>
> But the existing behavior may already be relied upon. So make the new
> behavior optional behind a flag.
>
> The new timestamp flag necessitates increasing sk_tsflags to 32 bits.
> Move the field in struct sock to avoid growing the socket (for some
> common CONFIG variants). The UAPI interface so_timestamping.flags is
> already int, so 32 bits wide.
>
> Reported-by: Jakub Kicinski <kuba@...nel.org>
> Signed-off-by: Willem de Bruijn <willemb@...gle.com>

Acked-by: Soheil Hassas Yeganeh <soheil@...gle.com>

Thanks for the fix!

> ---
>
> Alternative solutions are
>
> * make the change unconditionally: a one line change.
> * make the condition a (per netns) sysctl instead of flag
> * make SOF_TIMESTAMPING_OPT_ID_TCP not a modifier of, but alternative
>   to SOF_TIMESTAMPING_OPT_ID. That requires also updating all existing
>   code that now tests OPT_ID to test a new OPT_ID_MASK.
>
> Weighing the variants, this seemed the best option to me.
> ---
>  Documentation/networking/timestamping.rst | 19 +++++++++++++++++++
>  include/net/sock.h                        |  6 +++---
>  include/uapi/linux/net_tstamp.h           |  3 ++-
>  net/core/sock.c                           |  9 ++++++++-
>  net/ethtool/common.c                      |  1 +
>  5 files changed, 33 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/networking/timestamping.rst b/Documentation/networking/timestamping.rst
> index be4eb1242057..578f24731be5 100644
> --- a/Documentation/networking/timestamping.rst
> +++ b/Documentation/networking/timestamping.rst
> @@ -192,6 +192,25 @@ SOF_TIMESTAMPING_OPT_ID:
>    among all possibly concurrently outstanding timestamp requests for
>    that socket.
>
> +SOF_TIMESTAMPING_OPT_ID_TCP:
> +  Pass this modifier along with SOF_TIMESTAMPING_OPT_ID for new TCP
> +  timestamping applications. SOF_TIMESTAMPING_OPT_ID defines how the
> +  counter increments for stream sockets, but its starting point is
> +  not entirely trivial. This modifier option changes that point.
> +
> +  A reasonable expectation is that the counter is reset to zero with
> +  the system call, so that a subsequent write() of N bytes generates
> +  a timestamp with counter N-1. SOF_TIMESTAMPING_OPT_ID_TCP
> +  implements this behavior under all conditions.
> +
> +  SOF_TIMESTAMPING_OPT_ID without modifier often reports the same,
> +  especially when the socket option is set when no data is in
> +  transmission. If data is being transmitted, it may be off by the
> +  length of the output queue (SIOCOUTQ) due to being based on snd_una
> +  rather than write_seq. That offset depends on factors outside of
> +  process control, including network RTT and peer response time. The
> +  difference is subtle and unlikely to be noticed when confiugred at
> +  initial socket creation. But .._OPT_ID behavior is more predictable.
>
>  SOF_TIMESTAMPING_OPT_CMSG:
>    Support recv() cmsg for all timestamped packets. Control messages
> diff --git a/include/net/sock.h b/include/net/sock.h
> index 6d207e7c4ad0..ecea3dcc2217 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -503,10 +503,10 @@ struct sock {
>  #if BITS_PER_LONG==32
>         seqlock_t               sk_stamp_seq;
>  #endif
> -       u16                     sk_tsflags;
> -       u8                      sk_shutdown;
>         atomic_t                sk_tskey;
>         atomic_t                sk_zckey;
> +       u32                     sk_tsflags;
> +       u8                      sk_shutdown;
>
>         u8                      sk_clockid;
>         u8                      sk_txtime_deadline_mode : 1,
> @@ -1899,7 +1899,7 @@ static inline void sock_replace_proto(struct sock *sk, struct proto *proto)
>  struct sockcm_cookie {
>         u64 transmit_time;
>         u32 mark;
> -       u16 tsflags;
> +       u32 tsflags;
>  };
>
>  static inline void sockcm_init(struct sockcm_cookie *sockc,
> diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
> index 55501e5e7ac8..a2c66b3d7f0f 100644
> --- a/include/uapi/linux/net_tstamp.h
> +++ b/include/uapi/linux/net_tstamp.h
> @@ -31,8 +31,9 @@ enum {
>         SOF_TIMESTAMPING_OPT_PKTINFO = (1<<13),
>         SOF_TIMESTAMPING_OPT_TX_SWHW = (1<<14),
>         SOF_TIMESTAMPING_BIND_PHC = (1 << 15),
> +       SOF_TIMESTAMPING_OPT_ID_TCP = (1 << 16),
>
> -       SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_BIND_PHC,
> +       SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_ID_TCP,
>         SOF_TIMESTAMPING_MASK = (SOF_TIMESTAMPING_LAST - 1) |
>                                  SOF_TIMESTAMPING_LAST
>  };
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 4571914a4aa8..b0ab841e0aed 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -901,13 +901,20 @@ int sock_set_timestamping(struct sock *sk, int optname,
>         if (val & ~SOF_TIMESTAMPING_MASK)
>                 return -EINVAL;
>
> +       if (val & SOF_TIMESTAMPING_OPT_ID_TCP &&
> +           !(val & SOF_TIMESTAMPING_OPT_ID))
> +               return -EINVAL;
> +
>         if (val & SOF_TIMESTAMPING_OPT_ID &&
>             !(sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID)) {
>                 if (sk_is_tcp(sk)) {
>                         if ((1 << sk->sk_state) &
>                             (TCPF_CLOSE | TCPF_LISTEN))
>                                 return -EINVAL;
> -                       atomic_set(&sk->sk_tskey, tcp_sk(sk)->snd_una);
> +                       if (val & SOF_TIMESTAMPING_OPT_ID_TCP)
> +                               atomic_set(&sk->sk_tskey, tcp_sk(sk)->write_seq);
> +                       else
> +                               atomic_set(&sk->sk_tskey, tcp_sk(sk)->snd_una);
>                 } else {
>                         atomic_set(&sk->sk_tskey, 0);
>                 }
> diff --git a/net/ethtool/common.c b/net/ethtool/common.c
> index 21cfe8557205..6f399afc2ff2 100644
> --- a/net/ethtool/common.c
> +++ b/net/ethtool/common.c
> @@ -417,6 +417,7 @@ const char sof_timestamping_names[][ETH_GSTRING_LEN] = {
>         [const_ilog2(SOF_TIMESTAMPING_OPT_PKTINFO)]  = "option-pktinfo",
>         [const_ilog2(SOF_TIMESTAMPING_OPT_TX_SWHW)]  = "option-tx-swhw",
>         [const_ilog2(SOF_TIMESTAMPING_BIND_PHC)]     = "bind-phc",
> +       [const_ilog2(SOF_TIMESTAMPING_OPT_ID_TCP)]   = "option-id-tcp",
>  };
>  static_assert(ARRAY_SIZE(sof_timestamping_names) == __SOF_TIMESTAMPING_CNT);
>
> --
> 2.39.0.rc0.267.gcb52ba06e7-goog
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ