lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20121003232548.eb6b6b22.bircoph@gmail.com>
Date:	Wed, 3 Oct 2012 23:25:48 +0400
From:	Andrew Savchenko <bircoph@...il.com>
To:	netdev@...r.kernel.org
Subject: Kernel recieves DNS reply, but doesn't deliver it to a waiting
 application

Hello,

I encountered a very weird bug: after a while of uptime kernel stops to deliver
DNS reply to applications. Tcpdump shows that correct reply is recieved, but 
strace shows inquiring application never recieves it and ends with timeout,
epoll_wait() always returns 0:
a slice from: $ host kernel.org 8.8.8.8:

sendmsg(20, {msg_name(16)={sa_family=AF_INET, sin_port=htons(53),
sin_addr=inet_addr("8.8.8.8")}, msg_iov(1)=[{"\266\344\1\0\0\1\0\0\0\0\0\0\6k
ernel\3org\0\0\1\0\1", 28}], msg_controllen=0, msg_flags=0}, 0) = 28            
epoll_wait(3, {}, 64, 0)                = 0                                     
epoll_wait(3, {}, 64, 4999)             = 0

Though tcpdump shows a normal reply:

20:28:44.162897 IP 10.7.74.7.43167 > 8.8.8.8.domain: 46820+ A? kernel.org. (28) 
20:28:44.221308 IP 8.8.8.8.domain > 10.7.74.7.43167: 46820 1/0/0 A 149.20.4.69
(44)

After this bug has occured, it is no longer possible to perform DNS request on
the crippled system. I tried to stop/restart all network-related daemons, to
recreate network interfaces whenever possible (e.g. pppX devices), but with no
help. I use iptables and ebtables on this host, but reseting them (flushing all
chains, removing user chains, setting all policies to ACCEPT) doesn't help. The
only worknig solution is to reboot the system.

This bug happens rarely and randomly (about once in 7-12 days on 24x7 available
production system), but I had it 5 times already. Due to rare and random nature
of the bug I can't bisect it.

This problem occured after I updated vanilla kernel from 2.6.39.4 to 3.4.6.
Afterward I updated kernel to 3.4.10 in the hope that this will fix the
problem, but with no result. (I updated kernel due to commit
2ce42ec4ef551b08d2e5d26775d838ac640f82ad, which describes somewhat similar
issue, though I don't use I/OAT engine due to lack of hardware support.)

More details, attached trace files and kernel configs are available at bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=48081

In a few days I'll try 3.4.12 (I need to rebuild kernel anyway due to unrelated
issue) and will report if this bug will occur again. But please note it may
take several weeks to check this.

Best regards,
Andrew Savchenko

Content of type "application/pgp-signature" skipped

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ