lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5655f8ba-7b4f-44a8-ac4a-a028b50e01c8@gmail.com>
Date: Tue, 9 Jul 2024 09:46:26 -0600
From: David Ahern <dsahern@...il.com>
To: Omid Ehtemam-Haghighi <omid.ehtemamhaghighi@...losecurity.com>,
 netdev@...r.kernel.org
Cc: adrian.oliver@...losecurity.com,
 Nicolas Dichtel <nicolas.dichtel@...nd.com>
Subject: Re: [PATCH v2] net/ipv6: Fix soft lockups in fib6_select_path under
 high next hop churn

[ cc Nicolas - author of legacy IPv6 multipath code ]

On 7/9/24 9:37 AM, Omid Ehtemam-Haghighi wrote:
> Soft lockups have been observed on a cluster of Linux-based edge routers
> located in a highly dynamic environment. Using the `bird` service, these
> routers continuously update BGP-advertised routes due to frequently
> changing nexthop destinations, while also managing significant IPv6
> traffic. The lockups occur during the traversal of the multipath
> circular linked-list in the `fib6_select_path` function, particularly
> while iterating through the siblings in the list. The issue typically
> arises when the nodes of the linked list are unexpectedly deleted
> concurrently on a different core—indicated by their 'next' and
> 'previous' elements pointing back to the node itself and their reference
> count dropping to zero. This results in an infinite loop, leading to a
> soft lockup that triggers a system panic via the watchdog timer.
> 

I will review the patch when I get some time (traveling this week), but
bird really should be converted to the new separate nexthop API. It
makes route updates much faster, is already rcu based for updates and
should avoid problems like this on high rates of change.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ