[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <05fd4fe6-da80-6e1f-2122-d38c6b58675c@gmail.com>
Date: Mon, 1 Nov 2021 19:37:14 -0600
From: David Ahern <dsahern@...il.com>
To: James Prestwood <prestwoj@...il.com>, netdev@...r.kernel.org
Cc: davem@...emloft.net, kuba@...nel.org, corbet@....net,
yoshfuji@...ux-ipv6.org, dsahern@...nel.org, roopa@...dia.com,
daniel@...earbox.net, vladimir.oltean@....com, idosch@...dia.com,
nikolay@...dia.com, yajun.deng@...ux.dev, zhutong@...zon.com,
johannes@...solutions.net, jouni@...eaurora.org
Subject: Re: [PATCH 1/3] net: arp: introduce arp_evict_nocarrier sysctl
parameter
On 11/1/21 11:36 AM, James Prestwood wrote:
> This change introduces a new sysctl parameter, arp_evict_nocarrier.
> When set (default) the ARP cache will be cleared on a NOCARRIER event.
> This new option has been defaulted to '1' which maintains existing
> behavior.
>
> Clearing the ARP cache on NOCARRIER is relatively new, introduced by:
>
> commit 859bd2ef1fc1110a8031b967ee656c53a6260a76
> Author: David Ahern <dsahern@...il.com>
> Date: Thu Oct 11 20:33:49 2018 -0700
>
> net: Evict neighbor entries on carrier down
>
> The reason for this changes is to prevent the ARP cache from being
> cleared when a wireless device roams. Specifically for wireless roams
> the ARP cache should not be cleared because the underlying network has not
> changed. Clearing the ARP cache in this case can introduce significant
> delays sending out packets after a roam.
>
> A user reported such a situation here:
>
> https://lore.kernel.org/linux-wireless/CACsRnHWa47zpx3D1oDq9JYnZWniS8yBwW1h0WAVZ6vrbwL_S0w@mail.gmail.com/
>
> After some investigation it was found that the kernel was holding onto
> packets until ARP finished which resulted in this 1 second delay. It
> was also found that the first ARP who-has was never responded to,
> which is actually what caues the delay. This change is more or less
> working around this behavior, but again, there is no reason to clear
> the cache on a roam anyways.
>
> As for the unanswered who-has, we know the packet made it OTA since
> it was seen while monitoring. Why it never received a response is
> unknown. In any case, since this is a problem on the AP side of things
> all that can be done is to work around it until it is solved.
>
> Some background on testing/reproducing the packet delay:
>
> Hardware:
> - 2 access points configured for Fast BSS Transition (Though I don't
> see why regular reassociation wouldn't have the same behavior)
> - Wireless station running IWD as supplicant
> - A device on network able to respond to pings (I used one of the APs)
>
> Procedure:
> - Connect to first AP
> - Ping once to establish an ARP entry
> - Start a tcpdump
> - Roam to second AP
> - Wait for operstate UP event, and note the timestamp
> - Start pinging
>
> Results:
>
> Below is the tcpdump after UP. It was recorded the interface went UP at
> 10:42:01.432875.
>
> 10:42:01.461871 ARP, Request who-has 192.168.254.1 tell 192.168.254.71, length 28
> 10:42:02.497976 ARP, Request who-has 192.168.254.1 tell 192.168.254.71, length 28
> 10:42:02.507162 ARP, Reply 192.168.254.1 is-at ac:86:74:55:b0:20, length 46
> 10:42:02.507185 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 1, length 64
> 10:42:02.507205 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 2, length 64
> 10:42:02.507212 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 3, length 64
> 10:42:02.507219 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 4, length 64
> 10:42:02.507225 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 5, length 64
> 10:42:02.507232 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 6, length 64
> 10:42:02.515373 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 1, length 64
> 10:42:02.521399 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 2, length 64
> 10:42:02.521612 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 3, length 64
> 10:42:02.521941 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 4, length 64
> 10:42:02.522419 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 5, length 64
> 10:42:02.523085 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 6, length 64
>
> You can see the first ARP who-has went out very quickly after UP, but
> was never responded to. Nearly a second later the kernel retries and
> gets a response. Only then do the ping packets go out. If an ARP entry
> is manually added prior to UP (after the cache is cleared) it is seen
> that the first ping is never responded to, so its not only an issue with
> ARP but with data packets in general.
>
> As mentioned prior, the wireless interface was also monitored to verify
> the ping/ARP packet made it OTA which was observed to be true.
>
> Signed-off-by: James Prestwood <prestwoj@...il.com>
> ---
> Documentation/networking/ip-sysctl.rst | 9 +++++++++
> include/linux/inetdevice.h | 2 ++
> include/uapi/linux/ip.h | 1 +
> include/uapi/linux/sysctl.h | 1 +
> net/ipv4/arp.c | 11 ++++++++++-
> net/ipv4/devinet.c | 4 ++++
> 6 files changed, 27 insertions(+), 1 deletion(-)
>
Reviewed-by: David Ahern <dsahern@...nel.org>
Powered by blists - more mailing lists