nfnetlink: avoid unbound loop on busy Netlink socket I see a problem with how ctnetlink GET requests are being processed in the kernel (2.6.32.24) under high load. The sympton is Netlink looping around nfnetlink_rcv_msg(), which is just because netlink_unicast() came back with EAGAIN when trying to write the newly created Netlink skb to the SK receive buffer in ctnetlink_get_conntrack(). In this case a (possibly) infinit loop is entered. Mostly infinit I think in case the userland party trying to receive those messages may be stuck in the sendmsg() call, being unable to read anything if being single threaded. I tried to reproduce several times, a few times the loop disappeared and the box proceeded normally after some minutes. I have no explanation for this. The attached patch tries to solve it by simple not trying again to netlink_unicast() the reply skb and just fail with -ENOBUFS. The reasoning is that at the point a Netlink overrun is detected it seems counter intuitive to insist on sending one more Netlink message. Signed-off-by: Holger Eitzenberger Index: net-next-2.6/net/netfilter/nfnetlink.c =================================================================== --- net-next-2.6.orig/net/netfilter/nfnetlink.c 2010-12-03 14:33:32.000000000 +0100 +++ net-next-2.6/net/netfilter/nfnetlink.c 2010-12-03 14:34:21.000000000 +0100 @@ -138,7 +138,6 @@ return 0; type = nlh->nlmsg_type; -replay: ss = nfnetlink_get_subsys(type); if (!ss) { #ifdef CONFIG_MODULES @@ -169,7 +168,7 @@ err = nc->call(net->nfnl, skb, nlh, (const struct nlattr **)cda); if (err == -EAGAIN) - goto replay; + err = -ENOBUFS; return err; } }