[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 23 Nov 2012 11:45:39 +0400
From: Andrew Savchenko <bircoph@...il.com>
To: Eric Dumazet <eric.dumazet@...il.com>
Cc: netdev@...r.kernel.org
Subject: Re: [BUG] Kernel recieves DNS reply, but doesn't deliver it to a
waiting application
Hello,
On Sun, 21 Oct 2012 03:25:43 +0400 Andrew Savchenko wrote:
> > On Sat, 13 Oct 2012 15:44:20 +0200 Eric Dumazet wrote:
[...]
> > > You should investigate and check where the incoming packet is lost
> > >
> > > Tools :
> > >
> > > netstat -s
> > >
> > > drop_monitor module and dropwatch command
> > >
> > > cat /proc/net/udp
> >
> > Thank you for you reply; I updated my kernel to 3.4.14, enabled
> > CONFIG_NET_DROP_MONITOR, and installed dropwatch utility.
> >
> > I will report back when the bug will struck again.
> > This may take a weak or two, however.
>
> This bug is back again on kernel 3.4.14, but this time I was able to
> get debug data and to recover running kernel without reboot.
>
> Drowpatch showed that DNS UDP replies are always dropped here:
> 1 drops at __udp_queue_rcv_skb+61 (0xffffffff813bd670)
>
> Another observations:
> - only UDP replies are lost, TCP works fine;
> - if network load is dropped dramatically (ip_forward disabled, most
> network daemons are stopped) UDP DNS queries work again; but with
> gradual load increase replies became first slow and than cease at all.
> - CPU load is very low (uptime is below 0.05), so this shouldn't be
> an insufficient computing power issue.
>
> I found __udp_queue_rcv_skb function in net/ipv4/udp.c. From the code
> and observations above it follows that this is likely to be a ENOMEM
> condition leading to a packet loss.
[...]
> net.ipv4.udp_mem = 100000 150000 200000
>
> This solved my issue, at least for a while: DNS queries are working
> fine now.
And this solved problem only temporary: after 40 days of uptime the
same problem struck again with the same observables. I "solved" this
by increasing udp memory again:
net.ipv4.udp_mem = 200000 300000 400000
Of course, this solution is only a temporary workaround. Such
behaviour increases my suspicions on some kind of memory leak.
This host is still on 3.4.14, however: can't reboot now due to
workload. Will try 3.7 branch as soon as this will be possible.
Best regards,
Andrew Savchenko
Content of type "application/pgp-signature" skipped
Powered by blists - more mailing lists