[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZxThkZZBCAnuJf8Y@shredder.mtl.com>
Date: Sun, 20 Oct 2024 13:55:13 +0300
From: Ido Schimmel <idosch@...sch.org>
To: Omid Ehtemam-Haghighi <omid.ehtemamhaghighi@...losecurity.com>
Cc: netdev@...r.kernel.org, adrian.oliver@...losecurity.com,
Adrian Oliver <kernel@...iver.ca>, David Ahern <dsahern@...il.com>,
Eric Dumazet <edumazet@...gle.com>,
Kuniyuki Iwashima <kuniyu@...zon.com>,
Paolo Abeni <pabeni@...hat.com>
Subject: Re: [PATCH net v5] ipv6: Fix soft lockups in fib6_select_path under
high next hop churn
On Fri, Oct 18, 2024 at 08:49:58PM -0700, Omid Ehtemam-Haghighi wrote:
> Soft lockups have been observed on a cluster of Linux-based edge routers
> located in a highly dynamic environment. Using the `bird` service, these
> routers continuously update BGP-advertised routes due to frequently
> changing nexthop destinations, while also managing significant IPv6
> traffic. The lockups occur during the traversal of the multipath
> circular linked-list in the `fib6_select_path` function, particularly
> while iterating through the siblings in the list. The issue typically
> arises when the nodes of the linked list are unexpectedly deleted
> concurrently on a different core—indicated by their 'next' and
> 'previous' elements pointing back to the node itself and their reference
> count dropping to zero. This results in an infinite loop, leading to a
> soft lockup that triggers a system panic via the watchdog timer.
>
> Apply RCU primitives in the problematic code sections to resolve the
> issue. Where necessary, update the references to fib6_siblings to
> annotate or use the RCU APIs.
>
> Include a test script that reproduces the issue. The script
> periodically updates the routing table while generating a heavy load
> of outgoing IPv6 traffic through multiple iperf3 clients. It
> consistently induces infinite soft lockups within a couple of minutes.
[...]
> Fixes: 66f5d6ce53e6 ("ipv6: replace rwlock with rcu and spinlock in fib6_table")
> Reported-by: Adrian Oliver <kernel@...iver.ca>
> Tested-by: Omid Ehtemam-Haghighi <omid.ehtemamhaghighi@...losecurity.com>
> Signed-off-by: Omid Ehtemam-Haghighi <omid.ehtemamhaghighi@...losecurity.com>
You can drop your Tested-by given you authored the patch. It's implicit.
> Cc: David Ahern <dsahern@...il.com>
> Cc: Eric Dumazet <edumazet@...gle.com>
> Cc: Ido Schimmel <idosch@...sch.org>
> Cc: Kuniyuki Iwashima <kuniyu@...zon.com>
> Cc: Paolo Abeni <pabeni@...hat.com>
Please use scripts/get_maintainer.pl to get the full list of people to
copy.
[...]
> @@ -5195,14 +5195,18 @@ static void ip6_route_mpath_notify(struct fib6_info *rt,
> * nexthop. Since sibling routes are always added at the end of
> * the list, find the first sibling of the last route appended
> */
> + rcu_read_lock();
> +
> if ((nlflags & NLM_F_APPEND) && rt_last && rt_last->fib6_nsiblings) {
> - rt = list_first_entry(&rt_last->fib6_siblings,
> - struct fib6_info,
> - fib6_siblings);
> + rt = list_first_or_null_rcu(&rt_last->fib6_siblings,
> + struct fib6_info,
> + fib6_siblings);
> }
>
> if (rt)
> inet6_rt_notify(RTM_NEWROUTE, rt, info, nlflags);
Ran your test with a debug config [1] and got the following splat [2].
Fixed by [3]. Beside ip6_route_mpath_notify() all the callers of
inet6_rt_notify() already call it from an atomic context.
> +
> + rcu_read_unlock();
> }
[1] https://github.com/linux-netdev/nipa/wiki/How-to-run-netdev-selftests-CI-style
[2]
BUG: sleeping function called from invalid context at include/linux/sched/mm.h:321
in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 3730, name: ip
preempt_count: 0, expected: 0
RCU nest depth: 1, expected: 0
2 locks held by ip/3730:
#0: ffffffff86fc7c30 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x377/0xf60
#1: ffffffff8655b3e0 (rcu_read_lock){....}-{1:2}, at: ip6_route_mpath_notify+0x75/0x330
CPU: 5 UID: 0 PID: 3730 Comm: ip Tainted: G W 6.12.0-rc3-custom-virtme-g38827a50efa3 #134
Tainted: [W]=WARN
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
<TASK>
dump_stack_lvl+0xba/0x110
__might_resched.cold+0x1ed/0x253
kmem_cache_alloc_node_noprof+0x2ab/0x310
__alloc_skb+0x2da/0x3a0
inet6_rt_notify+0xf6/0x2a0
ip6_route_mpath_notify+0x12c/0x330
ip6_route_multipath_add+0xcc7/0x1f70
inet6_rtm_newroute+0xfb/0x170
rtnetlink_rcv_msg+0x3cc/0xf60
netlink_rcv_skb+0x171/0x450
netlink_unicast+0x539/0x7f0
netlink_sendmsg+0x8c1/0xd80
____sys_sendmsg+0x8f9/0xc20
___sys_sendmsg+0x197/0x1e0
__sys_sendmsg+0x122/0x1f0
do_syscall_64+0xbb/0x1d0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
[3]
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index d1065a0edc19..b9b986bda943 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -6192,7 +6192,7 @@ void inet6_rt_notify(int event, struct fib6_info *rt, struct nl_info *info,
err = -ENOBUFS;
seq = info->nlh ? info->nlh->nlmsg_seq : 0;
- skb = nlmsg_new(rt6_nlmsg_size(rt), gfp_any());
+ skb = nlmsg_new(rt6_nlmsg_size(rt), GFP_ATOMIC);
if (!skb)
goto errout;
@@ -6205,7 +6205,7 @@ void inet6_rt_notify(int event, struct fib6_info *rt, struct nl_info *info,
goto errout;
}
rtnl_notify(skb, net, info->portid, RTNLGRP_IPV6_ROUTE,
- info->nlh, gfp_any());
+ info->nlh, GFP_ATOMIC);
return;
errout:
rtnl_set_sk_err(net, RTNLGRP_IPV6_ROUTE, err);
[...]
> diff --git a/tools/testing/selftests/net/ipv6_route_update_soft_lockup.sh b/tools/testing/selftests/net/ipv6_route_update_soft_lockup.sh
> new file mode 100755
> index 000000000000..d257fbf0b0a3
> --- /dev/null
> +++ b/tools/testing/selftests/net/ipv6_route_update_soft_lockup.sh
[...]
> + # Check if any iperf3 instances are still running. This could occur if a core has entered an infinite loop and
> + # the timeout for the soft lockup to be detected has not expired yet, but either the test interval has already
> + # elapsed or the test is terminated manually (Ctrl+C)
I didn't thoroughly review the test and only ran it to make sure it's
passing, but I suggest wrapping some of the comments there to 80
characters to avoid checkpatch warnings.
Powered by blists - more mailing lists