netdev - Re: [PATCH net v5] ipv6: Fix soft lockups in fib6_select

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZxThkZZBCAnuJf8Y@shredder.mtl.com>
Date: Sun, 20 Oct 2024 13:55:13 +0300
From: Ido Schimmel <idosch@...sch.org>
To: Omid Ehtemam-Haghighi <omid.ehtemamhaghighi@...losecurity.com>
Cc: netdev@...r.kernel.org, adrian.oliver@...losecurity.com,
	Adrian Oliver <kernel@...iver.ca>, David Ahern <dsahern@...il.com>,
	Eric Dumazet <edumazet@...gle.com>,
	Kuniyuki Iwashima <kuniyu@...zon.com>,
	Paolo Abeni <pabeni@...hat.com>
Subject: Re: [PATCH net v5] ipv6: Fix soft lockups in fib6_select_path under
 high next hop churn

On Fri, Oct 18, 2024 at 08:49:58PM -0700, Omid Ehtemam-Haghighi wrote:
> Soft lockups have been observed on a cluster of Linux-based edge routers
> located in a highly dynamic environment. Using the `bird` service, these
> routers continuously update BGP-advertised routes due to frequently
> changing nexthop destinations, while also managing significant IPv6
> traffic. The lockups occur during the traversal of the multipath
> circular linked-list in the `fib6_select_path` function, particularly
> while iterating through the siblings in the list. The issue typically
> arises when the nodes of the linked list are unexpectedly deleted
> concurrently on a different core—indicated by their 'next' and
> 'previous' elements pointing back to the node itself and their reference
> count dropping to zero. This results in an infinite loop, leading to a
> soft lockup that triggers a system panic via the watchdog timer.
> 
> Apply RCU primitives in the problematic code sections to resolve the
> issue. Where necessary, update the references to fib6_siblings to
> annotate or use the RCU APIs.
> 
> Include a test script that reproduces the issue. The script
> periodically updates the routing table while generating a heavy load
> of outgoing IPv6 traffic through multiple iperf3 clients. It
> consistently induces infinite soft lockups within a couple of minutes.

[...]

> Fixes: 66f5d6ce53e6 ("ipv6: replace rwlock with rcu and spinlock in fib6_table")
> Reported-by: Adrian Oliver <kernel@...iver.ca>
> Tested-by: Omid Ehtemam-Haghighi <omid.ehtemamhaghighi@...losecurity.com>
> Signed-off-by: Omid Ehtemam-Haghighi <omid.ehtemamhaghighi@...losecurity.com>

You can drop your Tested-by given you authored the patch. It's implicit.

> Cc: David Ahern <dsahern@...il.com>
> Cc: Eric Dumazet <edumazet@...gle.com>
> Cc: Ido Schimmel <idosch@...sch.org>
> Cc: Kuniyuki Iwashima <kuniyu@...zon.com>
> Cc: Paolo Abeni <pabeni@...hat.com>

Please use scripts/get_maintainer.pl to get the full list of people to
copy.

[...]

> @@ -5195,14 +5195,18 @@ static void ip6_route_mpath_notify(struct fib6_info *rt,
>  	 * nexthop. Since sibling routes are always added at the end of
>  	 * the list, find the first sibling of the last route appended
>  	 */
> +	rcu_read_lock();
> +
>  	if ((nlflags & NLM_F_APPEND) && rt_last && rt_last->fib6_nsiblings) {
> -		rt = list_first_entry(&rt_last->fib6_siblings,
> -				      struct fib6_info,
> -				      fib6_siblings);
> +		rt = list_first_or_null_rcu(&rt_last->fib6_siblings,
> +					    struct fib6_info,
> +					    fib6_siblings);
>  	}
>  
>  	if (rt)
>  		inet6_rt_notify(RTM_NEWROUTE, rt, info, nlflags);

Ran your test with a debug config [1] and got the following splat [2].
Fixed by [3]. Beside ip6_route_mpath_notify() all the callers of
inet6_rt_notify() already call it from an atomic context.

> +
> +	rcu_read_unlock();
>  }

[1] https://github.com/linux-netdev/nipa/wiki/How-to-run-netdev-selftests-CI-style

[2]
BUG: sleeping function called from invalid context at include/linux/sched/mm.h:321
in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 3730, name: ip
preempt_count: 0, expected: 0
RCU nest depth: 1, expected: 0
2 locks held by ip/3730:
 #0: ffffffff86fc7c30 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x377/0xf60
 #1: ffffffff8655b3e0 (rcu_read_lock){....}-{1:2}, at: ip6_route_mpath_notify+0x75/0x330
CPU: 5 UID: 0 PID: 3730 Comm: ip Tainted: G        W          6.12.0-rc3-custom-virtme-g38827a50efa3 #134
Tainted: [W]=WARN
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
 <TASK>
 dump_stack_lvl+0xba/0x110
 __might_resched.cold+0x1ed/0x253
 kmem_cache_alloc_node_noprof+0x2ab/0x310
 __alloc_skb+0x2da/0x3a0
 inet6_rt_notify+0xf6/0x2a0
 ip6_route_mpath_notify+0x12c/0x330
 ip6_route_multipath_add+0xcc7/0x1f70
 inet6_rtm_newroute+0xfb/0x170
 rtnetlink_rcv_msg+0x3cc/0xf60
 netlink_rcv_skb+0x171/0x450
 netlink_unicast+0x539/0x7f0
 netlink_sendmsg+0x8c1/0xd80
 ____sys_sendmsg+0x8f9/0xc20
 ___sys_sendmsg+0x197/0x1e0
 __sys_sendmsg+0x122/0x1f0
 do_syscall_64+0xbb/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

[3]
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index d1065a0edc19..b9b986bda943 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -6192,7 +6192,7 @@ void inet6_rt_notify(int event, struct fib6_info *rt, struct nl_info *info,
        err = -ENOBUFS;
        seq = info->nlh ? info->nlh->nlmsg_seq : 0;
 
-       skb = nlmsg_new(rt6_nlmsg_size(rt), gfp_any());
+       skb = nlmsg_new(rt6_nlmsg_size(rt), GFP_ATOMIC);
        if (!skb)
                goto errout;
 
@@ -6205,7 +6205,7 @@ void inet6_rt_notify(int event, struct fib6_info *rt, struct nl_info *info,
                goto errout;
        }
        rtnl_notify(skb, net, info->portid, RTNLGRP_IPV6_ROUTE,
-                   info->nlh, gfp_any());
+                   info->nlh, GFP_ATOMIC);
        return;
 errout:
        rtnl_set_sk_err(net, RTNLGRP_IPV6_ROUTE, err);

[...]

> diff --git a/tools/testing/selftests/net/ipv6_route_update_soft_lockup.sh b/tools/testing/selftests/net/ipv6_route_update_soft_lockup.sh
> new file mode 100755
> index 000000000000..d257fbf0b0a3
> --- /dev/null
> +++ b/tools/testing/selftests/net/ipv6_route_update_soft_lockup.sh

[...]

> +	# Check if any iperf3 instances are still running. This could occur if a core has entered an infinite loop and
> +	# the timeout for the soft lockup to be detected has not expired yet, but either the test interval has already
> +	# elapsed or the test is terminated manually (Ctrl+C)

I didn't thoroughly review the test and only ran it to make sure it's
passing, but I suggest wrapping some of the comments there to 80
characters to avoid checkpatch warnings.