[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20101203133903.GG13225@mail.eitzenberger.org>
Date: Fri, 3 Dec 2010 14:39:03 +0100
From: Holger Eitzenberger <holger@...zenberger.org>
To: netfilter-devel <netfilter-devel@...ts.netfilter.org>
Cc: netdev@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>
Subject: ctnetlink loop
Hi,
I see a problem with how ctnetlink GET requests are being
processed in the kernel (2.6.32.24) under high load.
Initially I saw this problem on a large performance testing
system when getting HTTP proxy performance numbers, but lately
there have been two reports on large customers boxes (both
many-core with 10G NICs).
The sympton is Netlink looping around nfnetlink_rcv_msg(),
which is just because netlink_unicast() came back with -EAGAIN
when trying to write the newly created Netlink skb to the SK
receive buffer in ctnetlink_get_conntrack(). In this case a
(possibly) infinit loop is entered. Mostly infinit in fact in
case the userland party trying to receive those messages may
be stuck in the sendmsg() call, being unable to read anything
if being single threaded.
I tried to reproduce several times, a few times the loop
disappeared and the box proceeded normally after some time.
I have no explanation for this.
The attached patch tries to solve it by simple not trying
again to netlink_unicast() the reply skb and just fail with
-ENOBUFS. The reasoning is that at the point a Netlink overrun
is observed it seems counter intuitive to insist on sending
one more Netlink message.
I checked for possible side effects to other Netlink requests,
please check.
The patch applies to net-next-2.6.
Feedback appreciated.
/holger
View attachment "nfnl-fix.diff" of type "text/x-diff" (1697 bytes)
Powered by blists - more mailing lists