[<prev] [next>] [day] [month] [year] [list]
Message-ID: <27467ed4-0520-8642-f4c7-6f4aeb54ef2a@pm.me>
Date: Wed, 09 Oct 2019 17:05:02 +0000
From: Nate Sweet <nathanjsweet@...me>
To: netdev@...r.kernel.org
Subject: UDP Statistics Bug?
Hey net devs,
I would like some clarity on a problem I ran into last week. I was
diagnosing a DNS issue last week and got very side tracked by how
netstat reported stats to me. My issue was that UDP packets were being
dropped by all UDP sockets on the host, so when I ran `nestat -naus` and
it informed me that UdpInErrors
(https://elixir.bootlin.com/linux/v5.4-rc2/source/include/uapi/linux/snmp.h#L156)
was my main problem I spent a day trying to figure out what
application/mechanism was dropping UDP packets on the host. My
suspicion, based on the statistic I was seeing, was that it was going to
be something like BPF or a security module. To be fair to me, these two
mechanisms do indeed report their drops within this statistic
(https://elixir.bootlin.com/linux/v5.4-rc2/source/net/ipv4/udp.c#L2051).
Imagine my surprise when I discovered that the error that was actually
happening, was that the global UDP socket min was being reached, and all
the host UDP sockets were, indeed, experiencing buffer errors. The
problem is that wihtin the regular UDP socket datapath
`UDP_MIB_RCVBUFERRORS` only seem to be set here
(https://elixir.bootlin.com/linux/v5.4-rc2/source/net/ipv4/udp.c#L1945)
when the error is "ENOMEM". However, when `__sk_mem_raise_allocated`
fails
(https://elixir.bootlin.com/linux/v5.4-rc2/source/net/ipv4/udp.c#L1455)
it reports "ENOBUF". The issue ended up being an application that was
not processing it's backlog, because it wasn't closing old UDP sockets.
IMO, I would have gotten to this dianosis quicker if when I ran `nestat
-naus` I had gotten UdpRcvBuffErrors (`UDP_MIB_RCVBUFERRORS`) instead of
UdpInErrors. I realize that it is too late to change this error
reporting now, because it would break user space, but I think a new
error could be added to the kernel for UDP, such as
UdpRcvBuffGlobalErrors, or something like that, which could be double
reported. I think this would be a real time saver for folks, because I
really think UdpInErrors is counter-intuitively incorrect.
Thanks,
Nate Sweet
Powered by blists - more mailing lists