netdev - Re: The sk_err mechanism is infuriating in userspace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CALCETrXTOQRaGf650+fdyH1yKJLFY-WTpXWkThakacV0GKA=eg@mail.gmail.com>
Date: Wed, 28 Feb 2024 12:00:48 -0800
From: Andy Lutomirski <luto@...capital.net>
To: Paolo Abeni <pabeni@...hat.com>, "David S. Miller" <davem@...emloft.net>, 
	Jakub Kicinski <kuba@...nel.org>
Cc: Network Development <netdev@...r.kernel.org>, Linux API <linux-api@...r.kernel.org>
Subject: Re: The sk_err mechanism is infuriating in userspace

On Tue, Feb 6, 2024 at 9:24 AM Andy Lutomirski <luto@...capital.net> wrote:
>
> On Tue, Feb 6, 2024 at 12:43 AM Paolo Abeni <pabeni@...hat.com> wrote:
> >
> > What about 'destination/port unreachable' and many other similar errors
> > reported by sk_err? Which specific errors reported by sk_err does not
> > indicate that anything is wrong with the socket ?

I started writing a series to improve this in a backwards-compatible
way, but now I'm wondering whether the current behavior may be
partially a regression and not actually something well-enshrined in
history.

The nasty behavior in question is that, if a UDP or ping (or
presumably TCP, but that case is not necessarily a problem) socket
enables IP_RECVERR, then an ICMP error will asynchronously cause the
next sendmsg() to fail.  The code that causes this seems to be ancient
(I think it's sock_wait_for_wmem, which predates git, but I won't
swear to that)

Looking at my own logs, though, a Linux 4.5.2 did not seem to
regularly trigger this, and I'm getting it on a regular basis on 6.2
and some newer kernels.  And, somewhat damningly (with IP addresses
redacted):

$ traceroute -I 10.1.2.3
traceroute to 10.1.2.3 (10.1.2.3), 30 hops max, 60 byte packets
 1  * * *
 2  10.5.6.7 (10.5.6.7)  0.593 ms  0.793 ms  0.988 ms
 3  10.8.9.10 (10.8.9.10)  1.247 ms  1.547 ms  1.881 ms
 4  10.11.12.13 (10.11.12.13)  1.032 ms  1.333 ms  1.679 ms
send: No route to host

Whoops, traceroute is getting a bogus return when it sends a packet,
causing it to give up.  The real trace should be longer.

So I'm wondering if maybe this behavior should be seen as a bug to be
fixed and not a weird old API that needs to be preserved.

--Andy