netdev - Re: [PATCH net-next] net_tstamp: add SOF_TIMESTAMPING_OPT_ID

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+FuTScpBNEDy6D+dBaj3avMzXCQBRMUQib_gkono4V5k+Ke9w@mail.gmail.com>
Date:   Tue, 6 Dec 2022 15:46:25 -0500
From:   Willem de Bruijn <willemdebruijn.kernel@...il.com>
To:     Jakub Kicinski <kuba@...nel.org>
Cc:     Willem de Bruijn <willemdebruijn.kernel@...il.com>,
        netdev@...r.kernel.org, davem@...emloft.net, edumazet@...gle.com,
        pabeni@...hat.com, soheil@...gle.com
Subject: Re: [PATCH net-next] net_tstamp: add SOF_TIMESTAMPING_OPT_ID_TCP

On Tue, Dec 6, 2022 at 3:22 PM Jakub Kicinski <kuba@...nel.org> wrote:
>
> On Mon,  5 Dec 2022 18:09:25 -0500 Willem de Bruijn wrote:
> > Add an option to initialize SOF_TIMESTAMPING_OPT_ID for TCP from
> > write_seq sockets instead of snd_una.
> >
> > Intuitively the contract is that the counter is zero after the
> > setsockopt, so that the next write N results in a notification for
> > last byte N - 1.
> >
> > On idle sockets snd_una == write_seq so this holds for both. But on
> > sockets with data in transmission, snd_una depends on the ACK response
> > from the peer. A process cannot learn this in a race free manner
> > (ioctl SIOCOUTQ is one racy approach).
>
> We can't just copy back the value of
>
>         tcp_sk(sk)->snd_una - tcp_sk(sk)->write_seq
>
> to the user if the input of setsockopt is large enough (ie. extend the
> struct, if len >= sizeof(new struct) -> user is asking to get this?
> Or even add a bit somewhere that requests a copy back?

We could, but indeed then we first need a way to signal that the
kernel is new enough to actually write something meaningful back that
is safe to read.

And if we change the kernel API and applications, I find this a
somewhat hacky approach: why program the slightly wrong thing and
return the offset to patch it up in userspace, if we can just program
the right thing to begin with?

> Highly unlikely to break anything, I reckon? But whether setsockopt()
> can copy back is not 100% clear to me...
>
> > write_seq is a better starting point because based on the seqno of
> > data written by the process only.
> >
> > But the existing behavior may already be relied upon. So make the new
> > behavior optional behind a flag.
> >
> > The new timestamp flag necessitates increasing sk_tsflags to 32 bits.
> > Move the field in struct sock to avoid growing the socket (for some
> > common CONFIG variants). The UAPI interface so_timestamping.flags is
> > already int, so 32 bits wide.
> >
> > Reported-by: Jakub Kicinski <kuba@...nel.org>
>
> Reported-by: Sotirios Delimanolis <sotodel@...a.com>
>
> I'm just a bad human information router.
>
> > Signed-off-by: Willem de Bruijn <willemb@...gle.com>
> >
> > ---
> >
> > Alternative solutions are
> >
> > * make the change unconditionally: a one line change.
> > * make the condition a (per netns) sysctl instead of flag
> > * make SOF_TIMESTAMPING_OPT_ID_TCP not a modifier of, but alternative
> >   to SOF_TIMESTAMPING_OPT_ID. That requires also updating all existing
> >   code that now tests OPT_ID to test a new OPT_ID_MASK.
>
>  * copy back the SIOCOUTQ
>
> ;)
>
> > Weighing the variants, this seemed the best option to me.
> > ---
> >  Documentation/networking/timestamping.rst | 19 +++++++++++++++++++
> >  include/net/sock.h                        |  6 +++---
> >  include/uapi/linux/net_tstamp.h           |  3 ++-
> >  net/core/sock.c                           |  9 ++++++++-
> >  net/ethtool/common.c                      |  1 +
> >  5 files changed, 33 insertions(+), 5 deletions(-)
> >
> > diff --git a/Documentation/networking/timestamping.rst b/Documentation/networking/timestamping.rst
> > index be4eb1242057..578f24731be5 100644
> > --- a/Documentation/networking/timestamping.rst
> > +++ b/Documentation/networking/timestamping.rst
> > @@ -192,6 +192,25 @@ SOF_TIMESTAMPING_OPT_ID:
> >    among all possibly concurrently outstanding timestamp requests for
> >    that socket.
> >
> > +SOF_TIMESTAMPING_OPT_ID_TCP:
> > +  Pass this modifier along with SOF_TIMESTAMPING_OPT_ID for new TCP
> > +  timestamping applications. SOF_TIMESTAMPING_OPT_ID defines how the
> > +  counter increments for stream sockets, but its starting point is
> > +  not entirely trivial. This modifier option changes that point.
> > +
> > +  A reasonable expectation is that the counter is reset to zero with
> > +  the system call, so that a subsequent write() of N bytes generates
> > +  a timestamp with counter N-1. SOF_TIMESTAMPING_OPT_ID_TCP
> > +  implements this behavior under all conditions.
> > +
> > +  SOF_TIMESTAMPING_OPT_ID without modifier often reports the same,
> > +  especially when the socket option is set when no data is in
> > +  transmission. If data is being transmitted, it may be off by the
> > +  length of the output queue (SIOCOUTQ) due to being based on snd_una
> > +  rather than write_seq. That offset depends on factors outside of
> > +  process control, including network RTT and peer response time. The
> > +  difference is subtle and unlikely to be noticed when confiugred at

note to self: confiugred -> configured

> > +  initial socket creation. But .._OPT_ID behavior is more predictable.
>
> I reckon this needs to be more informative. Say how exactly they differ
> (written vs queued for transmission). And I'd add to
> SOF_TIMESTAMPING_OPT_ID docs a note to "see also .._OPT_ID_TCP version".

Will do. Assuming we're good with this approach.