netdev - Re: [PATCH v4] net: neighbour: introduce EVICT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211019145359.30465322@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>
Date:   Tue, 19 Oct 2021 14:53:59 -0700
From:   Jakub Kicinski <kuba@...nel.org>
To:     James Prestwood <prestwoj@...il.com>
Cc:     netdev@...r.kernel.org, "David S . Miller" <davem@...emloft.net>,
        Jonathan Corbet <corbet@....net>,
        Hideaki YOSHIFUJI <yoshfuji@...ux-ipv6.org>,
        David Ahern <dsahern@...nel.org>,
        Roopa Prabhu <roopa@...dia.com>,
        Daniel Borkmann <daniel@...earbox.net>,
        Vladimir Oltean <vladimir.oltean@....com>,
        Ido Schimmel <idosch@...dia.com>,
        Nikolay Aleksandrov <nikolay@...dia.com>,
        Yajun Deng <yajun.deng@...ux.dev>,
        Tong Zhu <zhutong@...zon.com>,
        Johannes Berg <johannes@...solutions.net>,
        Jouni Malinen <jouni@...eaurora.org>
Subject: Re: [PATCH v4] net: neighbour: introduce EVICT_NOCARRIER table
 option

On Mon, 18 Oct 2021 12:26:57 -0700 James Prestwood wrote:
> This adds an option to ARP/NDISC tables that clears the table on
> NOCARRIER events. The default option (1) maintains existing
> behavior.
> 
> Clearing the ARP cache on NOCARRIER is relatively new, introduced by:
> 
> commit 859bd2ef1fc1110a8031b967ee656c53a6260a76
> Author: David Ahern <dsahern@...il.com>
> Date:   Thu Oct 11 20:33:49 2018 -0700
> 
>     net: Evict neighbor entries on carrier down
> 
> The reason for this change is to allow userspace to control when
> the ARP/NDISC cache is cleared. In most cases the cache should be
> cleared on NOCARRIER, but in the context of a wireless roams the
> cache should not be cleared since the underlying network has not
> changed. Clearing the cache on a wireless roam can introduce delays
> in sending out packets (waiting for ARP/neighbor discovery) and
> this has been reported by a user here:
> 
> https://lore.kernel.org/linux-wireless/CACsRnHWa47zpx3D1oDq9JYnZWniS8yBwW1h0WAVZ6vrbwL_S0w@mail.gmail.com/
> 
> After some investigation it was found that the kernel was holding onto
> packets until ARP finished which resulted in this 1 second delay. It
> was also found that the first ARP who-has was never responded to,
> which is actually what causes the delay. This change is more or less
> working around this behavior, but again, there is no reason to clear
> the cache on a roam anyways.
> 
> As for the unanswered who-has, we know the packet made it OTA since
> it was seen while monitoring. Why it never received a response is
> unknown. In any case, since this is a problem on the AP side of things
> all that can be done is to work around it until it is solved.
> 
> Some background on testing/reproducing the packet delay:
> 
> Hardware:
>  - 2 access points configured for Fast BSS Transition (Though I don't
>    see why regular reassociation wouldn't have the same behavior)
>  - Wireless station running IWD as supplicant
>  - A device on network able to respond to pings (I used one of the APs)
> 
> Procedure:
>  - Connect to first AP
>  - Ping once to establish an ARP entry
>  - Start a tcpdump
>  - Roam to second AP
>  - Wait for operstate UP event, and note the timestamp
>  - Start pinging
> 
> Results:
> 
> Below is the tcpdump after UP. It was recorded the interface went UP at
> 10:42:01.432875.
> 
> 10:42:01.461871 ARP, Request who-has 192.168.254.1 tell 192.168.254.71, length 28
> 10:42:02.497976 ARP, Request who-has 192.168.254.1 tell 192.168.254.71, length 28
> 10:42:02.507162 ARP, Reply 192.168.254.1 is-at ac:86:74:55:b0:20, length 46
> 10:42:02.507185 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 1, length 64
> 10:42:02.507205 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 2, length 64
> 10:42:02.507212 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 3, length 64
> 10:42:02.507219 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 4, length 64
> 10:42:02.507225 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 5, length 64
> 10:42:02.507232 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 6, length 64
> 10:42:02.515373 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 1, length 64
> 10:42:02.521399 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 2, length 64
> 10:42:02.521612 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 3, length 64
> 10:42:02.521941 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 4, length 64
> 10:42:02.522419 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 5, length 64
> 10:42:02.523085 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 6, length 64
> 
> You can see the first ARP who-has went out very quickly after UP, but
> was never responded to. Nearly a second later the kernel retries and
> gets a response. Only then do the ping packets go out. If an ARP entry
> is manually added prior to UP (after the cache is cleared) it is seen
> that the first ping is never responded to, so its not only an issue with
> ARP but with data packets in general.
> 
> As mentioned prior, the wireless interface was also monitored to verify
> the ping/ARP packet made it OTA which was observed to be true.
> 
> Signed-off-by: James Prestwood <prestwoj@...il.com>

>  static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev,
> -			    bool skip_perm)
> +			    bool nocarrier)

If I'm reading this right nocarrier being false means this 
is a interface configuration even (ifconfig down). This reads
strange. Can we invert the logic and call the parameter ifdown 
or carrier_only?

>  {
>  	int i;
>  	struct neigh_hash_table *nht;
> @@ -336,7 +336,8 @@ static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev,
>  				np = &n->next;
>  				continue;
>  			}
> -			if (skip_perm && n->nud_state & NUD_PERMANENT) {
> +			if (nocarrier && ((n->nud_state & NUD_PERMANENT) ||
> +			    !NEIGH_VAR(n->parms, EVICT_NOCARRIER))) {

This is misaligned, please align the continuation line under the
innermost bracket it belongs to.

>  				np = &n->next;
>  				continue;
>  			}

> @@ -2249,6 +2252,7 @@ static const struct nla_policy nl_ntbl_parm_policy[NDTPA_MAX+1] = {
>  	[NDTPA_ANYCAST_DELAY]		= { .type = NLA_U64 },
>  	[NDTPA_PROXY_DELAY]		= { .type = NLA_U64 },
>  	[NDTPA_LOCKTIME]		= { .type = NLA_U64 },
> +	[NDTPA_EVICT_NOCARRIER]		= { .type = NLA_U8 },

Input needs to be validated, right? NLA_POLICY_MAX(NLA_U8, 1) ?

>  };

> @@ -3679,6 +3686,7 @@ static struct neigh_sysctl_table {
>  		NEIGH_SYSCTL_UNRES_QLEN_REUSED_ENTRY(QUEUE_LEN, QUEUE_LEN_BYTES, "unres_qlen"),
>  		NEIGH_SYSCTL_MS_JIFFIES_REUSED_ENTRY(RETRANS_TIME_MS, RETRANS_TIME, "retrans_time_ms"),
>  		NEIGH_SYSCTL_MS_JIFFIES_REUSED_ENTRY(BASE_REACHABLE_TIME_MS, BASE_REACHABLE_TIME, "base_reachable_time_ms"),
> +		NEIGH_SYSCTL_ZERO_INTMAX_ENTRY(EVICT_NOCARRIER, "evict_nocarrier"),

Same here, we need to accept only the meaningful values (0 and 1).

>  		[NEIGH_VAR_GC_INTERVAL] = {
>  			.procname	= "gc_interval",
>  			.maxlen		= sizeof(int),