lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABUuw65R3or9HeHsMT_isVx1f-7B6eCPPdr+bNR6f6wbKPnHOQ@mail.gmail.com>
Date:   Wed, 15 May 2019 15:56:32 -0400
From:   Adam Urban <adam.urban@...leguru.org>
To:     netdev@...r.kernel.org
Subject: Kernel UDP behavior with missing destinations

We have an application where we are use sendmsg() to send (lots of)
UDP packets to multiple destinations over a single socket, repeatedly,
and at a pretty constant rate using IPv4.

In some cases, some of these destinations are no longer present on the
network, but we continue sending data to them anyways. The missing
devices are usually a temporary situation, but can last for
days/weeks/months.

We are seeing an issue where packets sent even to destinations that
are present on the network are getting dropped while the kernel
performs arp updates.

We see a -1 EAGAIN (Resource temporarily unavailable) return value
from the sendmsg() call when this is happening:

sendmsg(72, {msg_name(16)={sa_family=AF_INET, sin_port=htons(1234),
sin_addr=inet_addr("10.1.2.3")}, msg_iov(1)=[{"\4\1"..., 96}],
msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = -1 EAGAIN (Resource
temporarily unavailable)

Looking at packet captures, during this time you see the kernel arping
for the devices that aren't on the network, timing out, arping again,
timing out, and then finally arping a 3rd time before setting the
INCOMPLETE state again (very briefly being in a FAILED state).

"Good" packets don't start going out again until the 3rd timeout
happens, and then they go out for about 1s until the 3s delay from ARP
happens again.

Interestingly, this isn't an all or nothing situation. With only a few
(2-3) devices missing, we don't run into this "blocking" situation and
data always goes out. But once 4 or more devices are missing, it
happens. Setting static ARP entries for the missing supplies, even if
they are bogus, resolves the issue, but of course results in packets
with a bogus destination going out on the wire instead of getting
dropped by the kernel.

Can anyone explain why this is happening? I have tried tuning the
unres_qlen sysctl without effect and will next try to set the
MSG_DONTWAIT socket option to try and see if that helps. But I want to
make sure I understand what is going on.

Are there any parameters we can tune so that UDP packets sent to
INCOMPLETE destinations are immediately dropped? What's the best way
to prevent a socket from being unavailable while arp operations are
happening (assuming arp is the cause)?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ