lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <10e8c5b435c3d19d062581b31e34de8e8511f75d.camel@gmail.com>
Date:   Thu, 14 Oct 2021 09:29:05 -0700
From:   James Prestwood <prestwoj@...il.com>
To:     Jakub Kicinski <kuba@...nel.org>
Cc:     netdev@...r.kernel.org
Subject: Re: [PATCH 1/4] net: arp: introduce arp_evict_nocarrier sysctl
 parameter

Hi Jakub,

On Wed, 2021-10-13 at 16:37 -0700, Jakub Kicinski wrote:
> On Wed, 13 Oct 2021 15:27:07 -0700 James Prestwood wrote:
> > This change introduces a new sysctl parameter, arp_evict_nocarrier.
> > When set (default) the ARP cache will be cleared on a NOCARRIER
> > event.
> > This new option has been defaulted to '1' which maintains existing
> > behavior.
> > 
> > Clearing the ARP cache on NOCARRIER is relatively new, introduced
> > by:
> > 
> > commit 859bd2ef1fc1110a8031b967ee656c53a6260a76
> > Author: David Ahern <dsahern@...il.com>
> > Date:   Thu Oct 11 20:33:49 2018 -0700
> > 
> >     net: Evict neighbor entries on carrier down
> > 
> > The reason for this changes is to prevent the ARP cache from being
> > cleared when a wireless device roams. Specifically for wireless
> > roams
> > the ARP cache should not be cleared because the underlying network
> > has not
> > changed. Clearing the ARP cache in this case can introduce
> > significant
> > delays sending out packets after a roam.
> > 
> > A user reported such a situation here:
> > 
> > https://lore.kernel.org/linux-wireless/CACsRnHWa47zpx3D1oDq9JYnZWniS8yBwW1h0WAVZ6vrbwL_S0w@mail.gmail.com/
> > 
> > After some investigation it was found that the kernel was holding
> > onto
> > packets until ARP finished which resulted in this 1 second delay.
> > It
> > was also found that the first ARP who-has was never responded to,
> > which is actually what caues the delay. This change is more or less
> > working around this behavior, but again, there is no reason to
> > clear
> > the cache on a roam anyways.
> > 
> > As for the unanswered who-has, we know the packet made it OTA since
> > it was seen while monitoring. Why it never received a response is
> > unknown.
> > 
> > Signed-off-by: James Prestwood <prestwoj@...il.com>
> 
> Seems sensible at a glance, some quick feedback.
> 
> Please make sure you run ./scripts/get_maintainers.pl on the patch
> and add appropriate folks to CC.
> 
> Please rebase the code on top of this tree:
> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/
> 
> > diff --git a/include/linux/inetdevice.h
> > b/include/linux/inetdevice.h
> > index 53aa0343bf69..63180170fdbd 100644
> > --- a/include/linux/inetdevice.h
> > +++ b/include/linux/inetdevice.h
> > @@ -133,6 +133,7 @@ static inline void ipv4_devconf_setall(struct
> > in_device *in_dev)
> >  #define IN_DEV_ARP_ANNOUNCE(in_dev)    IN_DEV_MAXCONF((in_dev),
> > ARP_ANNOUNCE)
> >  #define IN_DEV_ARP_IGNORE(in_dev)      IN_DEV_MAXCONF((in_dev),
> > ARP_IGNORE)
> >  #define IN_DEV_ARP_NOTIFY(in_dev)      IN_DEV_MAXCONF((in_dev),
> > ARP_NOTIFY)
> > +#define IN_DEV_ARP_EVICT_NOCARRIER(in_dev)
> > IN_DEV_CONF_GET((in_dev), ARP_EVICT_NOCARRIER)
> 
> IN_DEV_ANDCONF() makes most sense, I'd think.

So given we want '1' as the default as well as the ability to toggle
this option per-netdev I thought this was more appropriate. One caviat
is this would not work for setting ALL netdev's, but I don't think this
is a valid use case; or at least I can't imagine why you'd want to ever
do this.

This is a whole new area to me though, so if I'm understanding these
macros wrong please educate me :)

> 
> >  struct in_ifaddr {
> >         struct hlist_node       hash;
> 
> > diff --git a/net/ipv4/arp.c b/net/ipv4/arp.c
> > index 922dd73e5740..50cfe4f37089 100644
> > --- a/net/ipv4/arp.c
> > +++ b/net/ipv4/arp.c
> > @@ -1247,6 +1247,7 @@ static int arp_netdev_event(struct
> > notifier_block *this, unsigned long event,
> >  {
> >         struct net_device *dev = netdev_notifier_info_to_dev(ptr);
> >         struct netdev_notifier_change_info *change_info;
> > +       struct in_device *in_dev = __in_dev_get_rcu(dev);
> 
> Don't we need to hold the RCU lock to call this?

I was wondering that as well. It seems to be used in some places and
not others. Maybe the caller locked prior in places where there was no
lock, I'll look further into it.

> 
> >         switch (event) {
> >         case NETDEV_CHANGEADDR:
> > @@ -1257,7 +1258,8 @@ static int arp_netdev_event(struct
> > notifier_block *this, unsigned long event,
> >                 change_info = ptr;
> >                 if (change_info->flags_changed & IFF_NOARP)
> >                         neigh_changeaddr(&arp_tbl, dev);
> > -               if (!netif_carrier_ok(dev))
> > +               if (IN_DEV_ARP_EVICT_NOCARRIER(in_dev) &&
> > +                   !netif_carrier_ok(dev))
> >                         neigh_carrier_down(&arp_tbl, dev);
> >                 break;
> >         default:


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ