lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89i+r1cGacVC_6n3-A-WSkAa_Nr+pmxJ7Gt+oP-P9by2aGw@mail.gmail.com>
Date: Sun, 4 May 2025 02:16:13 -0700
From: Eric Dumazet <edumazet@...gle.com>
To: Kuniyuki Iwashima <kuniyu@...zon.com>
Cc: "David S. Miller" <davem@...emloft.net>, David Ahern <dsahern@...nel.org>, 
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, Simon Horman <horms@...nel.org>, 
	Kuniyuki Iwashima <kuni1840@...il.com>, netdev@...r.kernel.org
Subject: Re: [PATCH v3 net-next 15/15] ipv6: Get rid of RTNL for SIOCADDRT and RTM_NEWROUTE.

On Thu, Apr 17, 2025 at 5:10 PM Kuniyuki Iwashima <kuniyu@...zon.com> wrote:
>
> Now we are ready to remove RTNL from SIOCADDRT and RTM_NEWROUTE.
>
> The remaining things to do are
>
>   1. pass false to lwtunnel_valid_encap_type_attr()
>   2. use rcu_dereference_rtnl() in fib6_check_nexthop()
>   3. place rcu_read_lock() before ip6_route_info_create_nh().
>
> Let's complete the RTNL-free conversion.
>
> When each CPU-X adds 100000 routes on table-X in a batch
> concurrently on c7a.metal-48xl EC2 instance with 192 CPUs,
>
> without this series:
>
>   $ sudo ./route_test.sh
>   ...
>   added 19200000 routes (100000 routes * 192 tables).
>   time elapsed: 191577 milliseconds.
>
> with this series:
>
>   $ sudo ./route_test.sh
>   ...
>   added 19200000 routes (100000 routes * 192 tables).
>   time elapsed: 62854 milliseconds.
>
> I changed the number of routes in each table (1000 ~ 100000)
> and consistently saw it finish 3x faster with this series.
>
> Note that now every caller of lwtunnel_valid_encap_type() passes
> false as the last argument, and this can be removed later.
>
> Signed-off-by: Kuniyuki Iwashima <kuniyu@...zon.com>
> ---
>  net/ipv4/nexthop.c |  4 ++--
>  net/ipv6/route.c   | 18 ++++++++++++------
>  2 files changed, 14 insertions(+), 8 deletions(-)
>
> diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
> index 6ba6cb1340c1..823e4a783d2b 100644
> --- a/net/ipv4/nexthop.c
> +++ b/net/ipv4/nexthop.c
> @@ -1556,12 +1556,12 @@ int fib6_check_nexthop(struct nexthop *nh, struct fib6_config *cfg,
>         if (nh->is_group) {
>                 struct nh_group *nhg;
>
> -               nhg = rtnl_dereference(nh->nh_grp);
> +               nhg = rcu_dereference_rtnl(nh->nh_grp);

Or add a condition about table lock being held ?

>                 if (nhg->has_v4)
>                         goto no_v4_nh;
>                 is_fdb_nh = nhg->fdb_nh;
>         } else {
> -               nhi = rtnl_dereference(nh->nh_info);
> +               nhi = rcu_dereference_rtnl(nh->nh_info);
>                 if (nhi->family == AF_INET)
>                         goto no_v4_nh;
>                 is_fdb_nh = nhi->fdb_nh;
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index c8c1c75268e3..bb46e724db73 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -3902,12 +3902,16 @@ int ip6_route_add(struct fib6_config *cfg, gfp_t gfp_flags,
>         if (IS_ERR(rt))
>                 return PTR_ERR(rt);
>
> +       rcu_read_lock();

This looks bogus to me (and syzbot)

Holding rcu_read_lock() from writers is almost always wrong.

In this case, this prevents any GFP_KERNEL allocations to happen
(among other things)

[ BUG: Invalid wait context ]
6.15.0-rc4-syzkaller-00746-g836b313a14a3 #0 Tainted: G W
-----------------------------
syz-executor234/5832 is trying to lock:
ffffffff8e021688 (pcpu_alloc_mutex){+.+.}-{4:4}, at:
pcpu_alloc_noprof+0x284/0x16b0 mm/percpu.c:1782
other info that might help us debug this:
context-{5:5}
1 lock held by syz-executor234/5832:
#0: ffffffff8df3b860 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire
include/linux/rcupdate.h:331 [inline]
#0: ffffffff8df3b860 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock
include/linux/rcupdate.h:841 [inline]
#0: ffffffff8df3b860 (rcu_read_lock){....}-{1:3}, at:
ip6_route_add+0x4d/0x2f0 net/ipv6/route.c:3913
stack backtrace:
CPU: 0 UID: 0 PID: 5832 Comm: syz-executor234 Tainted: G W
6.15.0-rc4-syzkaller-00746-g836b313a14a3 #0 PREEMPT(full)
Tainted: [W]=WARN
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 04/29/2025
Call Trace:
<TASK>
dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120
print_lock_invalid_wait_context kernel/locking/lockdep.c:4831 [inline]
check_wait_context kernel/locking/lockdep.c:4903 [inline]
__lock_acquire+0xbcf/0xd20 kernel/locking/lockdep.c:5185
lock_acquire+0x120/0x360 kernel/locking/lockdep.c:5866
__mutex_lock_common kernel/locking/mutex.c:601 [inline]
__mutex_lock+0x182/0xe80 kernel/locking/mutex.c:746
pcpu_alloc_noprof+0x284/0x16b0 mm/percpu.c:1782
dst_cache_init+0x37/0xc0 net/core/dst_cache.c:145
ip_tun_build_state+0x193/0x6b0 net/ipv4/ip_tunnel_core.c:687
lwtunnel_build_state+0x381/0x4c0 net/core/lwtunnel.c:137
fib_nh_common_init+0x129/0x460 net/ipv4/fib_semantics.c:635
fib6_nh_init+0x15e4/0x2030 net/ipv6/route.c:3669
ip6_route_info_create_nh+0x139/0x870 net/ipv6/route.c:3866
ip6_route_add+0xf6/0x2f0 net/ipv6/route.c:3915
inet6_rtm_newroute+0x284/0x1c50 net/ipv6/route.c:5732
rtnetlink_rcv_msg+0x7cc/0xb70 net/core/rtnetlink.c:6955
netlink_rcv_skb+0x219/0x490 net/netlink/af_netlink.c:2534
netlink_unicast_kernel net/netlink/af_netlink.c:1313 [inline]
netlink_unicast+0x758/0x8d0 net/netlink/af_netlink.c:1339
netlink_sendmsg+0x805/0xb30 net/netlink/af_netlink.c:1883
sock_sendmsg_nosec net/socket.c:712 [inline]
__sock_sendmsg+0x219/0x270 net/socket.c:727
____sys_sendmsg+0x505/0x830 net/socket.c:2566
___sys_sendmsg+0x21f/0x2a0 net/socket.c:2620
__sys_sendmsg net/socket.c:2652 [inline]
__do_sys_sendmsg net/socket.c:2657 [inline]
__se_sys_sendmsg net/socket.c:2655 [inline]
__x64_sys_sendmsg+0x19b/0x260 net/socket.c:2655
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xf6/0x210 arch/x86/entry/syscall_64.c:94

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ