netdev - Re: 2nd RTM_NEWLINK notification with operstate down is always 1 second delayed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240417153350.629168f8@hermes.local>
Date: Wed, 17 Apr 2024 15:33:50 -0700
From: Stephen Hemminger <stephen@...workplumber.org>
To: "Tom, Deepak Abraham" <deepak-abraham.tom@....com>
Cc: "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: 2nd RTM_NEWLINK notification with operstate down is always 1
 second delayed

On Wed, 17 Apr 2024 17:37:40 +0000
"Tom, Deepak Abraham" <deepak-abraham.tom@....com> wrote:

> Hi!
> 
> I have a system configured with 2 physical eth interfaces connected to a switch.
> When I reboot the switch, I see that the userspace RTM_NEWLINK notifications for the interfaces are always 1 second apart although both links actually go down almost simultaneously!
> The subsequent RTM_NEWLINK notifications when the switch comes back up are however only delayed by a few microseconds between each other, which is as expected.
> 
> Turns out this delay is intentionally introudced by the linux kernel networking code in net/core/link_watch.c, last modified 17 years ago in commit 294cc44:
>          /*
>           * Limit the number of linkwatch events to one
>           * per second so that a runaway driver does not
>           * cause a storm of messages on the netlink
>           * socket.  This limit does not apply to up events
>           * while the device qdisc is down.
>           */
> 
> 
> On modern high performance systems, limiting the number of down events to just one per second have far reaching consequences.
> I was wondering if it would be advisable to reduce this delay to something smaller, say 5ms (so 5ms+scheduling delay practically):

The reason is that for systems that are connected to the Internet with routing daemons
the impact of link state change is huge. A single link transistion may keep FRR (nee Quagga)
busy for a several seconds as it linearly evaluates 3 Million route entries. Maybe more recent
versions of FRR got smarter. This is also to avoid routing daemon propagating lots of changes
a.k.a route flap.