netdev - never disappearing neighbors with netlink arp

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <4A26C256.9060606@iki.fi>
Date:	Wed, 03 Jun 2009 21:35:02 +0300
From:	Timo Teräs <timo.teras@....fi>
To:	netdev@...r.kernel.org
Subject: never disappearing neighbors with netlink arp

Hi,

I found a very peculiar problem related to neighbor cache when using the
netlink arp api. I never noticed this earlier until recently one of the nodes
with a lot of traffic started getting "Neighbour table overflow" messages.

I made my opennhrp daemon reply immediately with NUD_INVALID if the address is
known to be unreachable which sounds like the proper thing to do.

However, after some tedious reading of sources, it looks that:
1. Packet triggers new neighbor solicitation, entry goes to NUD_INCOMPLETE,
   the skb gets queued and based on my neightable config the first solicit
   is sent directly via netlink.
2. Userland receives and sends immediately back an update to NUD_INVALID.
3. Now it looks like net/core/neighbour.c:neigh_update() first checks for
   !(new & NUD_VALID), this matches and does the state transition, but the
   queued skb:s are not dequeued / error reported. Which leaves refs to the
   neigh entry.

Now what happens after this is still a bit unclear to me, but it looks like
the entry never gets garbage collected after this.

I can probably workaround this from userland by just not replying at all
for non-existent neighbors. But what would be the proper fix for this?
It sounds bad if userland can flood never expiring entries to kernel.
Would just a simple skb queue flush / error reporting be enough? Do we
need to update time stamps too? Do something additional?

Cheers,
  Timo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html