[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <65c54cc9ea70c_1cb6bf29492@willemb.c.googlers.com.notmuch>
Date: Thu, 08 Feb 2024 16:51:05 -0500
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Andy Lutomirski <luto@...capital.net>,
Vadim Fedorenko <vadim.fedorenko@...ux.dev>
Cc: Willem de Bruijn <willemb@...gle.com>,
"David S. Miller" <davem@...emloft.net>,
Network Development <netdev@...r.kernel.org>,
Jakub Kicinski <kuba@...nel.org>
Subject: Re: SOF_TIMESTAMPING_OPT_ID is unreliable when sendmsg fails
Andy Lutomirski wrote:
> On Thu, Feb 8, 2024 at 11:55 AM Vadim Fedorenko
> <vadim.fedorenko@...ux.dev> wrote:
> >
> > On 08/02/2024 18:02, Andy Lutomirski wrote:
> > > I’ve been using OPT_ID-style timestamping for years, but for some
> > > reason this issue only bit me last week: if sendmsg() fails on a UDP
> > > or ping socket, sk_tskey is poorly. It may or may not get incremented
> > > by the failed sendmsg().
The intent is indeed to only increment on a successful send.
The implementation is complicated a bit by (1) being a socket level
option, thus also supporting SOCK_RAW and (2) MSG_MORE using multiple
send calls to only produce a single datagram and (3) fragmentation
producing multiple skbs for a single datagram.
If only SOCK_DGRAM, conceivably we could move this to udp_send_skb,
after the skb is created and after the usual error exit paths.
An alternative is in error paths to decrement the counter. This is
what we do for MSG_ZEROCOPY references. Unfortunately, with the
lockless UDP path, other threads could come inbetween the inc and dec,
so this is not really workable.
> > Well, there are several error paths, for sure. For the sockets you
> > mention the increment of tskey happens at __ip{,6}_append_data. There
> > are 2 different types of failures which can happen after the increment.
> > The first is MTU check fail, another one is memory allocation failures.
> > I believe we can move increment to a later position, after MTU check in
> > both functions to avoid first type of problem.
>
> For reasons that I still haven't deciphered, I'm sporadically getting
> EHOSTUNREACH after the increment. I can't find anything in the code
> that would cause that, and every time I try to instrument it, it stops
> happening :( I sendmsg to the same destination several times in rapid
> succession, and at most one of them will get EHOSTUNREACH.
UDP might fail on ICMP responses. Try sending to a closed port. A few
send calls will succeed, but eventually the send call will refuse to
send. The cause is in the IP layer.
> >
> > > I can think of at least three ways to improve this:
> > >
> > > 1. Make it so that the sequence number is genuinely only incremented
> > > on success. This may be tedious to implement and may be nearly
> > > impossible if there are multiple concurrent sendmsg() calls on the
> > > same socket.
> >
> > Multiple concurrent sendmsg() should bring a lot of problems on user-
> > space side. With current implementation the application has to track the
> > value of tskey to check incoming TX timestamp later. But with parallel
> > sendmsg() the app cannot be sure which value is assigned to which call
> > even in case of proper track value synchronization. That brings us to
> > the other solutions if we consider having parallel threads working with
> > same socket. Or we can simply pretend that it's impossible and then fix
> > error path to decrement tskey value.
> > >
> > > 2. Allow the user program to specify an explicit ID. cmsg values are
> > > variable length, so for datagram sockets, extending the
> > > SO_TIMESTAMPING cmsg with 64 bits of sequence number to be used for
> > > the TX timestamp on that particular packet might be a nice solution.
> > >
> >
> > This option can be really useful in case of really parallel work with
> > sockets.
>
> I personally like this one the best. Some care would be needed to
> allow programs to detect the new functionality. Any preferred way to
> handle it?
Regardless of whether we can fix the existing behavior, I also think
this is a worthwhile cmsg. As timestamping is a SOL_SOCKET option, the
cmsg should likely also be that, processed in __sock_cmsg_send.
Powered by blists - more mailing lists