netdev - Re: Kernel UDP behavior with missing destinations

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+FuTSdcik=QLc=XMjWSFWty=zEm6_0Q3xKMo=1zi2_zNjwjpw@mail.gmail.com>
Date:   Thu, 16 May 2019 23:22:32 -0400
From:   Willem de Bruijn <willemdebruijn.kernel@...il.com>
To:     Adam Urban <adam.urban@...leguru.org>
Cc:     Eric Dumazet <eric.dumazet@...il.com>,
        Willem de Bruijn <willemdebruijn.kernel@...il.com>,
        Network Development <netdev@...r.kernel.org>
Subject: Re: Kernel UDP behavior with missing destinations

On Thu, May 16, 2019 at 8:27 PM Adam Urban <adam.urban@...leguru.org> wrote:
>
> And replying to your earlier comment about TTL, yes I think a TTL on
> arp_queues would be hugely helpful.
>
> In any environment where you are streaming time-sensitive UDP traffic,
> you really want the kernel to be tuned to immediately drop the
> outgoing packet if the destination isn't yet known/in the arp table
> already...

For packets that need to be sent immediately or not at all, you
probably do not want a TTL, but simply for the send call to fail
immediately with EAGAIN instead of queuing the packet for ARP
resolution at all. Which is approximated with unres_qlen 0.

The relation between unres_qlen_bytes, arp_queue and SO_SNDBUF is
pretty straightforward in principal. Packets can be queued on the arp
queue until the byte limit is reached. Any packets on this queue still
have their memory counted towards their socket send budget. If a
packet is queued that causes to exceed the threshold, older packets
are freed and dropped as needed. Calculating the exact numbers is not
as straightforward, as, for instance, skb->truesize is a kernel
implementation detail.

The simple solution is just to overprovision the socket SO_SNDBUF. If
there are few sockets in the system that perform this role, that seems
perfectly fine.

> Doesn't make sense to keep it around while it arps, since
> by the time it has an answer and gets it into the arp table, the UDP
> packet that it queued for sending while waiting on the arp reply is
> likely already out of date. (And if it doesn't get an answer, I
> definitely don't want it filling up buffers with useless/old packets!)
>
> On Thu, May 16, 2019 at 12:05 PM Eric Dumazet <eric.dumazet@...il.com> wrote:
> >
> >
> >
> > On 5/16/19 7:47 AM, Willem de Bruijn wrote:
> > > On Wed, May 15, 2019 at 3:57 PM Adam Urban <adam.urban@...leguru.org> wrote:
> > >>
> > >> We have an application where we are use sendmsg() to send (lots of)
> > >> UDP packets to multiple destinations over a single socket, repeatedly,
> > >> and at a pretty constant rate using IPv4.
> > >>
> > >> In some cases, some of these destinations are no longer present on the
> > >> network, but we continue sending data to them anyways. The missing
> > >> devices are usually a temporary situation, but can last for
> > >> days/weeks/months.
> > >>
> > >> We are seeing an issue where packets sent even to destinations that
> > >> are present on the network are getting dropped while the kernel
> > >> performs arp updates.
> > >>
> > >> We see a -1 EAGAIN (Resource temporarily unavailable) return value
> > >> from the sendmsg() call when this is happening:
> > >>
> > >> sendmsg(72, {msg_name(16)={sa_family=AF_INET, sin_port=htons(1234),
> > >> sin_addr=inet_addr("10.1.2.3")}, msg_iov(1)=[{"\4\1"..., 96}],
> > >> msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = -1 EAGAIN (Resource
> > >> temporarily unavailable)
> > >>
> > >> Looking at packet captures, during this time you see the kernel arping
> > >> for the devices that aren't on the network, timing out, arping again,
> > >> timing out, and then finally arping a 3rd time before setting the
> > >> INCOMPLETE state again (very briefly being in a FAILED state).
> > >>
> > >> "Good" packets don't start going out again until the 3rd timeout
> > >> happens, and then they go out for about 1s until the 3s delay from ARP
> > >> happens again.
> > >>
> > >> Interestingly, this isn't an all or nothing situation. With only a few
> > >> (2-3) devices missing, we don't run into this "blocking" situation and
> > >> data always goes out. But once 4 or more devices are missing, it
> > >> happens. Setting static ARP entries for the missing supplies, even if
> > >> they are bogus, resolves the issue, but of course results in packets
> > >> with a bogus destination going out on the wire instead of getting
> > >> dropped by the kernel.
> > >>
> > >> Can anyone explain why this is happening? I have tried tuning the
> > >> unres_qlen sysctl without effect and will next try to set the
> > >> MSG_DONTWAIT socket option to try and see if that helps. But I want to
> > >> make sure I understand what is going on.
> > >>
> > >> Are there any parameters we can tune so that UDP packets sent to
> > >> INCOMPLETE destinations are immediately dropped? What's the best way
> > >> to prevent a socket from being unavailable while arp operations are
> > >> happening (assuming arp is the cause)?
> > >
> > > Sounds like hitting SO_SNDBUF limit due to datagrams being held on the
> > > neighbor queue. Especially since the issue occurs only as the number
> > > of unreachable destinations exceeds some threshold. Does
> > > /proc/net/stat/ndisc_cache show unresolved_discards? Increasing
> > > unres_qlen may make matters only worse if more datagrams can get
> > > queued. See also the branch on NUD_INCOMPLETE in __neigh_event_send.
> > >
> >
> > We probably should add a ttl on arp queues.
> >
> > neigh_probe() could do that quite easily.
> >