netdev - Re: SOF_TIMESTAMPING_OPT_ID is unreliable when sendmsg fails

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALCETrV3S1hj5tdQ1oCgm6ytgUOY8M3t9OSn0WcRLNYn3ZBURg@mail.gmail.com>
Date: Thu, 8 Feb 2024 14:40:20 -0800
From: Andy Lutomirski <luto@...capital.net>
To: Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: Vadim Fedorenko <vadim.fedorenko@...ux.dev>, Willem de Bruijn <willemb@...gle.com>, 
	"David S. Miller" <davem@...emloft.net>, Network Development <netdev@...r.kernel.org>, 
	Jakub Kicinski <kuba@...nel.org>
Subject: Re: SOF_TIMESTAMPING_OPT_ID is unreliable when sendmsg fails

On Thu, Feb 8, 2024 at 1:51 PM Willem de Bruijn
<willemdebruijn.kernel@...il.com> wrote:
>
> Andy Lutomirski wrote:
> > On Thu, Feb 8, 2024 at 11:55 AM Vadim Fedorenko
> > <vadim.fedorenko@...ux.dev> wrote:
> > >
> > > On 08/02/2024 18:02, Andy Lutomirski wrote:
> > > > I’ve been using OPT_ID-style timestamping for years, but for some
> > > > reason this issue only bit me last week: if sendmsg() fails on a UDP
> > > > or ping socket, sk_tskey is poorly.  It may or may not get incremented
> > > > by the failed sendmsg().
>
> The intent is indeed to only increment on a successful send.
>
> The implementation is complicated a bit by (1) being a socket level
> option, thus also supporting SOCK_RAW and (2) MSG_MORE using multiple
> send calls to only produce a single datagram and (3) fragmentation
> producing multiple skbs for a single datagram.
>
> If only SOCK_DGRAM, conceivably we could move this to udp_send_skb,
> after the skb is created and after the usual error exit paths.
>
> An alternative is in error paths to decrement the counter. This is
> what we do for MSG_ZEROCOPY references. Unfortunately, with the
> lockless UDP path, other threads could come inbetween the inc and dec,
> so this is not really workable.
>
> > > Well, there are several error paths, for sure. For the sockets you
> > > mention the increment of tskey happens at __ip{,6}_append_data. There
> > > are 2 different types of failures which can happen after the increment.
> > > The first is MTU check fail, another one is memory allocation failures.
> > > I believe we can move increment to a later position, after MTU check in
> > > both functions to avoid first type of problem.
> >
> > For reasons that I still haven't deciphered, I'm sporadically getting
> > EHOSTUNREACH after the increment.  I can't find anything in the code
> > that would cause that, and every time I try to instrument it, it stops
> > happening :(  I sendmsg to the same destination several times in rapid
> > succession, and at most one of them will get EHOSTUNREACH.
>
> UDP might fail on ICMP responses. Try sending to a closed port. A few
> send calls will succeed, but eventually the send call will refuse to
> send. The cause is in the IP layer.
>

I tracked down the code, finally.

But I maintain that this behavior is absurd.  Sure, if I do:

connect(fd, some address);
send(fd, ...);
<-- ICMP error because the port was closed
send(fd, ...) = -ECONNREFUSED;

then this makes a little bit of sense.  But that's about as far as it
makes sense to me.  This variant is a bit different:

connect(fd, some address);
send(fd, ...);
<-- ICMP error because the port was closed
send(fd, ...) = -ECONNREFUSED;
send(fd, ...) = 0;  <-- now it works again by magic!

okay, maybe I can stretch my imagination so this makes sense.  But
then this comes out of left field:

connect(fd, some address);
sendto(fd, ..., willem:1);
<-- ICMP error because the port was closed
sendto(fd, ..., andy:2) = -ECONNREFUSED;

excuse me?  And setting IP_RECVERR broadens the set of errors that
cause this IMO rather silly behavior, presumably motivated by the
login in the ip(7) manpage:

   When the user receives an error from a socket operation, the errors
can be received by calling recvmsg(2) with the MSG_ERRQUEUE flag set.

Isn't that what POLLERR is for?

And somehow the implementation of this logic for send, etc makes it
most of the way through the code before checking sock_error at all.

Anyway, I'll continue contemplating.