[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+FuTScpBNEDy6D+dBaj3avMzXCQBRMUQib_gkono4V5k+Ke9w@mail.gmail.com>
Date: Tue, 6 Dec 2022 15:46:25 -0500
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: Willem de Bruijn <willemdebruijn.kernel@...il.com>,
netdev@...r.kernel.org, davem@...emloft.net, edumazet@...gle.com,
pabeni@...hat.com, soheil@...gle.com
Subject: Re: [PATCH net-next] net_tstamp: add SOF_TIMESTAMPING_OPT_ID_TCP
On Tue, Dec 6, 2022 at 3:22 PM Jakub Kicinski <kuba@...nel.org> wrote:
>
> On Mon, 5 Dec 2022 18:09:25 -0500 Willem de Bruijn wrote:
> > Add an option to initialize SOF_TIMESTAMPING_OPT_ID for TCP from
> > write_seq sockets instead of snd_una.
> >
> > Intuitively the contract is that the counter is zero after the
> > setsockopt, so that the next write N results in a notification for
> > last byte N - 1.
> >
> > On idle sockets snd_una == write_seq so this holds for both. But on
> > sockets with data in transmission, snd_una depends on the ACK response
> > from the peer. A process cannot learn this in a race free manner
> > (ioctl SIOCOUTQ is one racy approach).
>
> We can't just copy back the value of
>
> tcp_sk(sk)->snd_una - tcp_sk(sk)->write_seq
>
> to the user if the input of setsockopt is large enough (ie. extend the
> struct, if len >= sizeof(new struct) -> user is asking to get this?
> Or even add a bit somewhere that requests a copy back?
We could, but indeed then we first need a way to signal that the
kernel is new enough to actually write something meaningful back that
is safe to read.
And if we change the kernel API and applications, I find this a
somewhat hacky approach: why program the slightly wrong thing and
return the offset to patch it up in userspace, if we can just program
the right thing to begin with?
> Highly unlikely to break anything, I reckon? But whether setsockopt()
> can copy back is not 100% clear to me...
>
> > write_seq is a better starting point because based on the seqno of
> > data written by the process only.
> >
> > But the existing behavior may already be relied upon. So make the new
> > behavior optional behind a flag.
> >
> > The new timestamp flag necessitates increasing sk_tsflags to 32 bits.
> > Move the field in struct sock to avoid growing the socket (for some
> > common CONFIG variants). The UAPI interface so_timestamping.flags is
> > already int, so 32 bits wide.
> >
> > Reported-by: Jakub Kicinski <kuba@...nel.org>
>
> Reported-by: Sotirios Delimanolis <sotodel@...a.com>
>
> I'm just a bad human information router.
>
> > Signed-off-by: Willem de Bruijn <willemb@...gle.com>
> >
> > ---
> >
> > Alternative solutions are
> >
> > * make the change unconditionally: a one line change.
> > * make the condition a (per netns) sysctl instead of flag
> > * make SOF_TIMESTAMPING_OPT_ID_TCP not a modifier of, but alternative
> > to SOF_TIMESTAMPING_OPT_ID. That requires also updating all existing
> > code that now tests OPT_ID to test a new OPT_ID_MASK.
>
> * copy back the SIOCOUTQ
>
> ;)
>
> > Weighing the variants, this seemed the best option to me.
> > ---
> > Documentation/networking/timestamping.rst | 19 +++++++++++++++++++
> > include/net/sock.h | 6 +++---
> > include/uapi/linux/net_tstamp.h | 3 ++-
> > net/core/sock.c | 9 ++++++++-
> > net/ethtool/common.c | 1 +
> > 5 files changed, 33 insertions(+), 5 deletions(-)
> >
> > diff --git a/Documentation/networking/timestamping.rst b/Documentation/networking/timestamping.rst
> > index be4eb1242057..578f24731be5 100644
> > --- a/Documentation/networking/timestamping.rst
> > +++ b/Documentation/networking/timestamping.rst
> > @@ -192,6 +192,25 @@ SOF_TIMESTAMPING_OPT_ID:
> > among all possibly concurrently outstanding timestamp requests for
> > that socket.
> >
> > +SOF_TIMESTAMPING_OPT_ID_TCP:
> > + Pass this modifier along with SOF_TIMESTAMPING_OPT_ID for new TCP
> > + timestamping applications. SOF_TIMESTAMPING_OPT_ID defines how the
> > + counter increments for stream sockets, but its starting point is
> > + not entirely trivial. This modifier option changes that point.
> > +
> > + A reasonable expectation is that the counter is reset to zero with
> > + the system call, so that a subsequent write() of N bytes generates
> > + a timestamp with counter N-1. SOF_TIMESTAMPING_OPT_ID_TCP
> > + implements this behavior under all conditions.
> > +
> > + SOF_TIMESTAMPING_OPT_ID without modifier often reports the same,
> > + especially when the socket option is set when no data is in
> > + transmission. If data is being transmitted, it may be off by the
> > + length of the output queue (SIOCOUTQ) due to being based on snd_una
> > + rather than write_seq. That offset depends on factors outside of
> > + process control, including network RTT and peer response time. The
> > + difference is subtle and unlikely to be noticed when confiugred at
note to self: confiugred -> configured
> > + initial socket creation. But .._OPT_ID behavior is more predictable.
>
> I reckon this needs to be more informative. Say how exactly they differ
> (written vs queued for transmission). And I'd add to
> SOF_TIMESTAMPING_OPT_ID docs a note to "see also .._OPT_ID_TCP version".
Will do. Assuming we're good with this approach.
Powered by blists - more mailing lists