netdev - Re: udp sendmsg ENOBUFS clarification

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CA+FuTSc3O4XQAmtyY5Fwy96nL17ewdCouvwAJ=6DeMUcQUiz8A@mail.gmail.com>
Date:   Wed, 18 Sep 2019 11:35:35 -0400
From:   Willem de Bruijn <willemdebruijn.kernel@...il.com>
To:     Josh Hunt <johunt@...mai.com>
Cc:     netdev <netdev@...r.kernel.org>,
        Eric Dumazet <edumazet@...gle.com>,
        David Miller <davem@...emloft.net>
Subject: Re: udp sendmsg ENOBUFS clarification

On Tue, Sep 17, 2019 at 4:20 PM Josh Hunt <johunt@...mai.com> wrote:
>
> I was running some tests recently with the udpgso_bench_tx benchmark in
> selftests and noticed that in some configurations it reported sending
> more than line rate! Looking into it more I found that I was overflowing
> the qdisc queue and so it was sending back NET_XMIT_DROP however this
> error did not propagate back up to the application and so it assumed
> whatever it sent was done successfully. That's when I learned about
> IP_RECVERR and saw that the benchmark isn't using that socket option.
>
> That's all fairly straightforward, but what I was hoping to get
> clarification on is where is the line drawn on when or when not to send
> ENOBUFS back to the application if IP_RECVERR is *not* set? My guess
> based on going through the code is that as long as the packet leaves the
> stack (in this case sent to the qdisc) that's where we stop reporting
> ENOBUFS back to the application, but can someone confirm?

Once a packet is queued the system call may return, so any subsequent
drops after dequeue are not propagated back. The relevant rc is set in
__dev_xmit_skb on q->enqueue. On setups with multiple devices, such as
a tunnel or bonding path, enqueue on the lower device is similar not
propagated.

> For example, we sanitize the error in udp_send_skb():
> send:
>          err = ip_send_skb(sock_net(sk), skb);
>          if (err) {
>                  if (err == -ENOBUFS && !inet->recverr) {
>                          UDP_INC_STATS(sock_net(sk),
>                                        UDP_MIB_SNDBUFERRORS, is_udplite);
>                          err = 0;
>                  }
>          } else
>
>
> but in udp_sendmsg() we don't:
>
>          if (err == -ENOBUFS || test_bit(SOCK_NOSPACE,
> &sk->sk_socket->flags)) {
>                  UDP_INC_STATS(sock_net(sk),
>                                UDP_MIB_SNDBUFERRORS, is_udplite);
>          }
>          return err;

That's interesting. My --incorrect-- understanding until now had been
that IP_RECVERR does nothing but enable optional extra detailed error
reporting on top of system call error codes.

But indeed it enables backpressure being reported as a system call
error that is suppressed otherwise. I don't know why. The behavior
precedes git history.

> In the case above it looks like we may only get ENOBUFS for allocation
> failures inside of the stack in udp_sendmsg() and so that's why we
> propagate the error back up to the application?

Both the udp lockless fast path and the slow corked path go through
udp_send_skb, so the backpressure is suppressed consistently across
both cases.

Indeed the error handling in udp_sendmsg then is not related to
backpressure, but to other causes of ENOBUF, i.e., allocation failure.