netdev - Re: SOF_TIMESTAMPING_OPT_ID is unreliable when sendmsg fails

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <65c54cc9ea70c_1cb6bf29492@willemb.c.googlers.com.notmuch>
Date: Thu, 08 Feb 2024 16:51:05 -0500
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Andy Lutomirski <luto@...capital.net>, 
 Vadim Fedorenko <vadim.fedorenko@...ux.dev>
Cc: Willem de Bruijn <willemb@...gle.com>, 
 "David S. Miller" <davem@...emloft.net>, 
 Network Development <netdev@...r.kernel.org>, 
 Jakub Kicinski <kuba@...nel.org>
Subject: Re: SOF_TIMESTAMPING_OPT_ID is unreliable when sendmsg fails

Andy Lutomirski wrote:
> On Thu, Feb 8, 2024 at 11:55 AM Vadim Fedorenko
> <vadim.fedorenko@...ux.dev> wrote:
> >
> > On 08/02/2024 18:02, Andy Lutomirski wrote:
> > > I’ve been using OPT_ID-style timestamping for years, but for some
> > > reason this issue only bit me last week: if sendmsg() fails on a UDP
> > > or ping socket, sk_tskey is poorly.  It may or may not get incremented
> > > by the failed sendmsg().

The intent is indeed to only increment on a successful send.

The implementation is complicated a bit by (1) being a socket level
option, thus also supporting SOCK_RAW and (2) MSG_MORE using multiple
send calls to only produce a single datagram and (3) fragmentation
producing multiple skbs for a single datagram.

If only SOCK_DGRAM, conceivably we could move this to udp_send_skb,
after the skb is created and after the usual error exit paths.

An alternative is in error paths to decrement the counter. This is
what we do for MSG_ZEROCOPY references. Unfortunately, with the
lockless UDP path, other threads could come inbetween the inc and dec,
so this is not really workable.

> > Well, there are several error paths, for sure. For the sockets you
> > mention the increment of tskey happens at __ip{,6}_append_data. There
> > are 2 different types of failures which can happen after the increment.
> > The first is MTU check fail, another one is memory allocation failures.
> > I believe we can move increment to a later position, after MTU check in
> > both functions to avoid first type of problem.
> 
> For reasons that I still haven't deciphered, I'm sporadically getting
> EHOSTUNREACH after the increment.  I can't find anything in the code
> that would cause that, and every time I try to instrument it, it stops
> happening :(  I sendmsg to the same destination several times in rapid
> succession, and at most one of them will get EHOSTUNREACH.

UDP might fail on ICMP responses. Try sending to a closed port. A few
send calls will succeed, but eventually the send call will refuse to
send. The cause is in the IP layer.

> >
> > > I can think of at least three ways to improve this:
> > >
> > > 1. Make it so that the sequence number is genuinely only incremented
> > > on success. This may be tedious to implement and may be nearly
> > > impossible if there are multiple concurrent sendmsg() calls on the
> > > same socket.
> >
> > Multiple concurrent sendmsg() should bring a lot of problems on user-
> > space side. With current implementation the application has to track the
> > value of tskey to check incoming TX timestamp later. But with parallel
> > sendmsg() the app cannot be sure which value is assigned to which call
> > even in case of proper track value synchronization. That brings us to
> > the other solutions if we consider having parallel threads working with
> > same socket. Or we can simply pretend that it's impossible and then fix
> > error path to decrement tskey value.
> > >
> > > 2. Allow the user program to specify an explicit ID.  cmsg values are
> > > variable length, so for datagram sockets, extending the
> > > SO_TIMESTAMPING cmsg with 64 bits of sequence number to be used for
> > > the TX timestamp on that particular packet might be a nice solution.
> > >
> >
> > This option can be really useful in case of really parallel work with
> > sockets.
> 
> I personally like this one the best.  Some care would be needed to
> allow programs to detect the new functionality.  Any preferred way to
> handle it?

Regardless of whether we can fix the existing behavior, I also think
this is a worthwhile cmsg. As timestamping is a SOL_SOCKET option, the
cmsg should likely also be that, processed in __sock_cmsg_send.