netdev - Re: The sk_err mechanism is infuriating in userspace

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <852606cd9cbc8da9c6735b4ad6216ba55408b767.camel@redhat.com>
Date: Tue, 06 Feb 2024 09:43:18 +0100
From: Paolo Abeni <pabeni@...hat.com>
To: Andy Lutomirski <luto@...capital.net>, Network Development
	 <netdev@...r.kernel.org>
Cc: Linux API <linux-api@...r.kernel.org>
Subject: Re: The sk_err mechanism is infuriating in userspace

On Mon, 2024-02-05 at 15:03 -0800, Andy Lutomirski wrote:
> Hi all-
> 
> I encounter this issue every couple of years, and it still seems to be
> an issue, and it drives me nuts every time I see it.
> 
> I write software that uses unconnected datagram-style sockets.  Errors
> happen for all kinds of reasons, and my software knows it.  My
> software even handles the errors and moves on with its life.  I use
> MSG_ERRQUEUE to understand the errors.  But the kernel fights back:
> 
> struct sk_buff *__skb_try_recv_datagram(struct sock *sk,
>                                         struct sk_buff_head *queue,
>                                         unsigned int flags, int *off, int *err,
>                                         struct sk_buff **last)
> {
>         struct sk_buff *skb;
>         unsigned long cpu_flags;
>         /*
>          * Caller is allowed not to check sk->sk_err before skb_recv_datagram()
>          */
>         int error = sock_error(sk);
> 
>         if (error)
>                 goto no_packet;
>         ^^^^^^^^^^ <----- EXCUSE ME?
> 
> The kernel even fights back on the *send* path?!?
> 
> static long sock_wait_for_wmem(struct sock *sk, long timeo)
> {
>         DEFINE_WAIT(wait);
> 
>         sk_clear_bit(SOCKWQ_ASYNC_NOSPACE, sk);
>         for (;;) {
>                 if (!timeo)
>                         break;
>                 if (signal_pending(current))
>                         break;
>                 set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
>                 ...
>                 if (READ_ONCE(sk->sk_err))
>                         break;  <-- KERNEL HATES UNCONNECTED SOCKETS!
> 
> This is IMO just broken.  I realize it's legacy behavior, but it's
> BROKEN legacy behavior. 

As you noted this is an established behaviour exposed to the user-
space, and we can't simply change it, regardless of it's own (eventual
lack of) merit.

>  sk_err does not (at least for an unconnected
> socket) indicate that anything is wrong with the socket. 

What about 'destination/port unreachable' and many other similar errors
reported by sk_err? Which specific errors reported by sk_err does not
indicate that anything is wrong with the socket ?

I guess that if you really want to ignore socket error for datagram
sockets at recvmsg()/sendmsg() time you could implement some new socket
option to conditionally enable such behaviour on a per socket basis.

Cheers,

Paolo