netdev - 2nd RTM_NEWLINK notification with operstate down is always 1 second delayed

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <DS7PR84MB303940368E1CC7CE98A49E96D70F2@DS7PR84MB3039.NAMPRD84.PROD.OUTLOOK.COM>
Date: Wed, 17 Apr 2024 17:37:40 +0000
From: "Tom, Deepak Abraham" <deepak-abraham.tom@....com>
To: "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: 2nd RTM_NEWLINK notification with operstate down is always 1 second
 delayed

Hi!

I have a system configured with 2 physical eth interfaces connected to a switch.
When I reboot the switch, I see that the userspace RTM_NEWLINK notifications for the interfaces are always 1 second apart although both links actually go down almost simultaneously!
The subsequent RTM_NEWLINK notifications when the switch comes back up are however only delayed by a few microseconds between each other, which is as expected.

Turns out this delay is intentionally introudced by the linux kernel networking code in net/core/link_watch.c, last modified 17 years ago in commit 294cc44:
         /*
          * Limit the number of linkwatch events to one
          * per second so that a runaway driver does not
          * cause a storm of messages on the netlink
          * socket.  This limit does not apply to up events
          * while the device qdisc is down.
          */


On modern high performance systems, limiting the number of down events to just one per second have far reaching consequences.
I was wondering if it would be advisable to reduce this delay to something smaller, say 5ms (so 5ms+scheduling delay practically):
--- a/net/core/link_watch.c
+++ b/net/core/link_watch.c
@@ -130,8 +130,8 @@ static void linkwatch_schedule_work(int urgent)
                delay = 0;
        }

-       /* If we wrap around we'll delay it by at most HZ. */
-       if (delay > HZ)
+       /* If we wrap around we'll delay it by at most HZ/200. */
+       if (delay > (HZ/200))
                delay = 0;

        /*
@@ -187,15 +187,15 @@ static void __linkwatch_run_queue(int urgent_only)

        /*
         * Limit the number of linkwatch events to one
-        * per second so that a runaway driver does not
+        * per 5 millisecond so that a runaway driver does not
         * cause a storm of messages on the netlink
         * socket.  This limit does not apply to up events
         * while the device qdisc is down.
         */
        if (!urgent_only)
-               linkwatch_nextevent = jiffies + HZ;
+               linkwatch_nextevent = jiffies + (HZ/200);
        /* Limit wrap-around effect on delay. */
-       else if (time_after(linkwatch_nextevent, jiffies + HZ))
+       else if (time_after(linkwatch_nextevent, jiffies + (HZ/200)))
                linkwatch_nextevent = jiffies;

        clear_bit(LW_URGENT, &linkwatch_flags);


I have tested this change in my environment, and it works as expected. I don't see any new issues popping up because of this.

Are there any concerns with making this change today? Hoping to get some feedback.


Thank You,
Deepak Abraham Tom