[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5655f8ba-7b4f-44a8-ac4a-a028b50e01c8@gmail.com>
Date: Tue, 9 Jul 2024 09:46:26 -0600
From: David Ahern <dsahern@...il.com>
To: Omid Ehtemam-Haghighi <omid.ehtemamhaghighi@...losecurity.com>,
netdev@...r.kernel.org
Cc: adrian.oliver@...losecurity.com,
Nicolas Dichtel <nicolas.dichtel@...nd.com>
Subject: Re: [PATCH v2] net/ipv6: Fix soft lockups in fib6_select_path under
high next hop churn
[ cc Nicolas - author of legacy IPv6 multipath code ]
On 7/9/24 9:37 AM, Omid Ehtemam-Haghighi wrote:
> Soft lockups have been observed on a cluster of Linux-based edge routers
> located in a highly dynamic environment. Using the `bird` service, these
> routers continuously update BGP-advertised routes due to frequently
> changing nexthop destinations, while also managing significant IPv6
> traffic. The lockups occur during the traversal of the multipath
> circular linked-list in the `fib6_select_path` function, particularly
> while iterating through the siblings in the list. The issue typically
> arises when the nodes of the linked list are unexpectedly deleted
> concurrently on a different core—indicated by their 'next' and
> 'previous' elements pointing back to the node itself and their reference
> count dropping to zero. This results in an infinite loop, leading to a
> soft lockup that triggers a system panic via the watchdog timer.
>
I will review the patch when I get some time (traveling this week), but
bird really should be converted to the new separate nexthop API. It
makes route updates much faster, is already rcu based for updates and
should avoid problems like this on high rates of change.
Powered by blists - more mailing lists