[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250515164731.48991-1-kuniyu@amazon.com>
Date: Thu, 15 May 2025 09:46:13 -0700
From: Kuniyuki Iwashima <kuniyu@...zon.com>
To: <pabeni@...hat.com>
CC: <davem@...emloft.net>, <dsahern@...nel.org>, <edumazet@...gle.com>,
<horms@...nel.org>, <kuba@...nel.org>, <kuni1840@...il.com>,
<kuniyu@...zon.com>, <netdev@...r.kernel.org>
Subject: Re: [PATCH v1 net-next 0/7] ipv6: Follow up for RTNL-free RTM_NEWROUTE series.
From: Paolo Abeni <pabeni@...hat.com>
Date: Thu, 15 May 2025 11:02:34 +0200
> On 5/15/25 4:05 AM, Kuniyuki Iwashima wrote:
> > From: Jakub Kicinski <kuba@...nel.org>
> > Date: Wed, 14 May 2025 18:45:02 -0700
> >> On Wed, 14 May 2025 13:18:53 -0700 Kuniyuki Iwashima wrote:
> >>> Patch 1 removes rcu_read_lock() in fib6_get_table().
> >>> Patch 2 removes rtnl_is_held arg for lwtunnel_valid_encap_type(), which
> >>> was short-term fix and is no longer used.
> >>> Patch 3 fixes RCU vs GFP_KERNEL report by syzkaller.
> >>> Patch 4~7 reverts GFP_ATOMIC uses to GFP_KERNEL.
> >>
> >> Hi! Something in the following set of patches is making our CI time out.
> >> The problem seems to be:
> >>
> >> [ 0.751266] virtme-init: waiting for udev to settle
> >> Timed out for waiting the udev queue being empty.
> >> [ 120.826428] virtme-init: udev is done
> >>
> >> +team: grab team lock during team_change_rx_flags
> >> +net: mana: Add handler for hardware servicing events
> >> +ipv6: Revert two per-cpu var allocation for RTM_NEWROUTE.
> >> +ipv6: Pass gfp_flags down to ip6_route_info_create_nh().
> >> +Revert "ipv6: Factorise ip6_route_multipath_add()."
> >> +Revert "ipv6: sr: switch to GFP_ATOMIC flag to allocate memory during seg6local LWT setup"
> >> +ipv6: Narrow down RCU critical section in inet6_rtm_newroute().
> >> +inet: Remove rtnl_is_held arg of lwtunnel_valid_encap_type(_attr)?().
> >> +ipv6: Remove rcu_read_lock() in fib6_get_table().
> >> +net/mlx5e: Reuse per-RQ XDP buffer to avoid stack zeroing overhead
> >> amd-xgbe: read link status twice to avoid inconsistencies
> >> +net: phy: fixed_phy: remove fixed_phy_register_with_gpiod
> >> drivers: net: mvpp2: attempt to refill rx before allocating skb
> >> +selftest: af_unix: Test SO_PASSRIGHTS.
> >> +af_unix: Introduce SO_PASSRIGHTS.
> >> +af_unix: Inherit sk_flags at connect().
> >> +af_unix: Move SOCK_PASS{CRED,PIDFD,SEC} to struct sock.
> >> +net: Restrict SO_PASS{CRED,PIDFD,SEC} to AF_{UNIX,NETLINK,BLUETOOTH}.
> >> +tcp: Restrict SO_TXREHASH to TCP socket.
> >> +scm: Move scm_recv() from scm.h to scm.c.
> >> +af_unix: Don't pass struct socket to maybe_add_creds().
> >> +af_unix: Factorise test_bit() for SOCK_PASSCRED and SOCK_PASSPIDFD.
> >>
> >> I haven't dug into it, gotta review / apply other patches :(
> >> Maybe you can try to repro?
> >
> > I think I was able to reproduce it with SO_PASSRIGHTS series
> > with virtme-ng (but not with normal qemu with AL2023 rootfs).
> >
> > After 2min, virtme-ng showed the console.
> >
> > [ 1.461450] virtme-ng-init: triggering udev coldplug
> > [ 1.533147] virtme-ng-init: waiting for udev to settle
> > [ 121.588624] virtme-ng-init: Timed out for waiting the udev queue being empty.
> > [ 121.588710] virtme-ng-init: udev is done
> > [ 121.593214] virtme-ng-init: initialization done
> > _ _
> > __ _(_)_ __| |_ _ __ ___ ___ _ __ __ _
> > \ \ / / | __| __| _ _ \ / _ \_____| _ \ / _ |
> > \ V /| | | | |_| | | | | | __/_____| | | | (_| |
> > \_/ |_|_| \__|_| |_| |_|\___| |_| |_|\__ |
> > |___/
> > kernel version: 6.15.0-rc4-virtme-00071-gceba111cf5e7 x86_64
> > (CTRL+d to exit)
> >
> >
> > Will investigate the cause.
> >
> > Sorry, but please drop the series and kick the CI again.
>
> FTR I think some CI iterations survived the boot and hit the following,
> in several forwarding tests (i.e. router-multipath-sh)
Oh thanks!
I learnt "make TARGETS=net run_tests" doesn't run forwarding tests.
Will fix in v2.
>
> [ 922.307796][ T6194] =============================
> [ 922.308069][ T6194] WARNING: suspicious RCU usage
> [ 922.308339][ T6194] 6.15.0-rc5-virtme #1 Not tainted
> [ 922.308596][ T6194] -----------------------------
> [ 922.308860][ T6194] ./include/net/addrconf.h:347 suspicious
> rcu_dereference_check() usage!
> [ 922.309352][ T6194]
> [ 922.309352][ T6194] other info that might help us debug this:
> [ 922.309352][ T6194]
> [ 922.310105][ T6194]
> [ 922.310105][ T6194] rcu_scheduler_active = 2, debug_locks = 1
> [ 922.310501][ T6194] 1 lock held by ip/6194:
> [ 922.310704][ T6194] #0: ffff888012942630
> (&tb->tb6_lock){+...}-{3:3}, at: ip6_route_multipath_add+0x743/0x1450
> [ 922.311255][ T6194]
> [ 922.311255][ T6194] stack backtrace:
> [ 922.311577][ T6194] CPU: 1 UID: 0 PID: 6194 Comm: ip Not tainted
> 6.15.0-rc5-virtme #1 PREEMPT(full)
> [ 922.311583][ T6194] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [ 922.311585][ T6194] Call Trace:
> [ 922.311589][ T6194] <TASK>
> [ 922.311591][ T6194] dump_stack_lvl+0xb0/0xd0
> [ 922.311605][ T6194] lockdep_rcu_suspicious+0x166/0x270
> [ 922.311619][ T6194] rt6_multipath_rebalance.part.0+0x70c/0x8a0
> [ 922.311628][ T6194] fib6_add_rt2node+0xa36/0x2c00
> [ 922.311668][ T6194] fib6_add+0x38d/0xec0
> [ 922.311699][ T6194] ip6_route_multipath_add+0x75b/0x1450
> [ 922.311753][ T6194] inet6_rtm_newroute+0xb2/0x120
> [ 922.311795][ T6194] rtnetlink_rcv_msg+0x710/0xc00
> [ 922.311819][ T6194] netlink_rcv_skb+0x12f/0x360
> [ 922.311869][ T6194] netlink_unicast+0x449/0x710
> [ 922.311891][ T6194] netlink_sendmsg+0x721/0xbe0
> [ 922.311922][ T6194] ____sys_sendmsg+0x7aa/0xa10
> [ 922.311954][ T6194] ___sys_sendmsg+0xed/0x170
> [ 922.312031][ T6194] __sys_sendmsg+0x108/0x1a0
> [ 922.312061][ T6194] do_syscall_64+0xc1/0x1d0
> [ 922.312069][ T6194] entry_SYSCALL_64_after_hwframe+0x77/0x7f
> [ 922.312074][ T6194] RIP: 0033:0x7f8e77c649a7
> [ 922.312078][ T6194] Code: 0a 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff
> eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00
> 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89
> 74 24 10
> [ 922.312081][ T6194] RSP: 002b:00007ffd73480708 EFLAGS: 00000246
> ORIG_RAX: 000000000000002e
> [ 922.312086][ T6194] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> 00007f8e77c649a7
> [ 922.312088][ T6194] RDX: 0000000000000000 RSI: 00007ffd73480770 RDI:
> 0000000000000005
> [ 922.312090][ T6194] RBP: 00007ffd73480abc R08: 0000000000000038 R09:
> 0000000000000000
> [ 922.312092][ T6194] R10: 000000000b9c6910 R11: 0000000000000246 R12:
> 00007ffd73481a80
> [ 922.312094][ T6194] R13: 00000000682562aa R14: 0000000000498600 R15:
> 00007ffd7348499b
> [ 922.312108][ T6194] </TASK>
>
> see:
>
> https://netdev.bots.linux.dev/contest.html?branch=net-next-2025-05-15--03-00&executor=vmksft-forwarding-dbg&pw-n=0&pass=0
>
> Thanks,
>
> Paolo
Powered by blists - more mailing lists