lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 20 Oct 2008 08:53:10 -0700
From:	"David Schwartz" <davids@...master.com>
To:	<swivel@...lls.gnugeneration.com>, <ncannasse@...ion-twin.com>
Cc:	<linux-kernel@...r.kernel.org>
Subject: RE: poll() blocked / packets not received ?


Nick Cannasse wrote:

> Ok, funny thing is that we just found what is occurring...
>
> We had a process that was on a regular basis doing the following :
>
> conntrack -F
>
> This was done in order to prevent the table to grow too big, because we
> were reaching the maximum size as told by :
>
> /proc/sys/net/ipv4/netfilter/ip_conntrack_max
>    and
> /proc/sys/net/ipv4/netfilter/ip_conntrack_count
>
> Seems like when there are active connections, this will break netfilter
> and stop delivering packets to the socket.
>
> At least I will have nice sleep tonight.

Note that this solved your symptom, not your problem. You actually have two
problems:

1) You rely on TCP to detect a lost connection even by a side that will
never transmit any data. TCP simply does not do this. If you are not trying
to send data, you are not assured that a lost connection will be detected.
(You either need a timeout, or you need to send or dribble some data,
depending on the protocl.)

2) You hold a lock on a shared resource while you wait for a reply over a
network. If this is a low-level "block and wait indefinitely" lock, this
will cause many threads to line up behind a slow/stuck thread. The right fix
depends on your circumstances, but you need to use a synchronization
primitive that is suitable. (You need to be able to use multiple connections
or defer operations without holding a thread.)

With both of these bugs, you are vulnerable to precisely the scenario you
observed. The TCP connection close packets were lost (in this case due to
premature expiration of the connnection tracking, but other things can do
it, such as the server rebooting), TCP could not detect the lost connection
because you never sent any data, so one thread blocked forever, and other
threads got in line behind it.

DS


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ