lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1d9bd2b1-9438-4605-b74a-8bab84bd95f5@redhat.com>
Date: Thu, 15 May 2025 11:02:34 +0200
From: Paolo Abeni <pabeni@...hat.com>
To: Kuniyuki Iwashima <kuniyu@...zon.com>, kuba@...nel.org
Cc: davem@...emloft.net, dsahern@...nel.org, edumazet@...gle.com,
 horms@...nel.org, kuni1840@...il.com, netdev@...r.kernel.org
Subject: Re: [PATCH v1 net-next 0/7] ipv6: Follow up for RTNL-free
 RTM_NEWROUTE series.

On 5/15/25 4:05 AM, Kuniyuki Iwashima wrote:
> From: Jakub Kicinski <kuba@...nel.org>
> Date: Wed, 14 May 2025 18:45:02 -0700
>> On Wed, 14 May 2025 13:18:53 -0700 Kuniyuki Iwashima wrote:
>>> Patch 1 removes rcu_read_lock() in fib6_get_table().
>>> Patch 2 removes rtnl_is_held arg for lwtunnel_valid_encap_type(), which
>>>  was short-term fix and is no longer used.
>>> Patch 3 fixes RCU vs GFP_KERNEL report by syzkaller.
>>> Patch 4~7 reverts GFP_ATOMIC uses to GFP_KERNEL.
>>
>> Hi! Something in the following set of patches is making our CI time out.
>> The problem seems to be:
>>
>> [    0.751266] virtme-init: waiting for udev to settle
>> Timed out for waiting the udev queue being empty.
>> [  120.826428] virtme-init: udev is done
>>
>> +team: grab team lock during team_change_rx_flags
>> +net: mana: Add handler for hardware servicing events
>> +ipv6: Revert two per-cpu var allocation for RTM_NEWROUTE.
>> +ipv6: Pass gfp_flags down to ip6_route_info_create_nh().
>> +Revert "ipv6: Factorise ip6_route_multipath_add()."
>> +Revert "ipv6: sr: switch to GFP_ATOMIC flag to allocate memory during seg6local LWT setup"
>> +ipv6: Narrow down RCU critical section in inet6_rtm_newroute().
>> +inet: Remove rtnl_is_held arg of lwtunnel_valid_encap_type(_attr)?().
>> +ipv6: Remove rcu_read_lock() in fib6_get_table().
>> +net/mlx5e: Reuse per-RQ XDP buffer to avoid stack zeroing overhead
>>  amd-xgbe: read link status twice to avoid inconsistencies
>> +net: phy: fixed_phy: remove fixed_phy_register_with_gpiod
>>  drivers: net: mvpp2: attempt to refill rx before allocating skb
>> +selftest: af_unix: Test SO_PASSRIGHTS.
>> +af_unix: Introduce SO_PASSRIGHTS.
>> +af_unix: Inherit sk_flags at connect().
>> +af_unix: Move SOCK_PASS{CRED,PIDFD,SEC} to struct sock.
>> +net: Restrict SO_PASS{CRED,PIDFD,SEC} to AF_{UNIX,NETLINK,BLUETOOTH}.
>> +tcp: Restrict SO_TXREHASH to TCP socket.
>> +scm: Move scm_recv() from scm.h to scm.c.
>> +af_unix: Don't pass struct socket to maybe_add_creds().
>> +af_unix: Factorise test_bit() for SOCK_PASSCRED and SOCK_PASSPIDFD.
>>
>> I haven't dug into it, gotta review / apply other patches :(
>> Maybe you can try to repro? 
> 
> I think I was able to reproduce it with SO_PASSRIGHTS series
> with virtme-ng (but not with normal qemu with AL2023 rootfs).
> 
> After 2min, virtme-ng showed the console.
> 
> [    1.461450] virtme-ng-init: triggering udev coldplug
> [    1.533147] virtme-ng-init: waiting for udev to settle
> [  121.588624] virtme-ng-init: Timed out for waiting the udev queue being empty.
> [  121.588710] virtme-ng-init: udev is done
> [  121.593214] virtme-ng-init: initialization done
>           _      _
>    __   _(_)_ __| |_ _ __ ___   ___       _ __   __ _
>    \ \ / / |  __| __|  _   _ \ / _ \_____|  _ \ / _  |
>     \ V /| | |  | |_| | | | | |  __/_____| | | | (_| |
>      \_/ |_|_|   \__|_| |_| |_|\___|     |_| |_|\__  |
>                                                 |___/
>    kernel version: 6.15.0-rc4-virtme-00071-gceba111cf5e7 x86_64
>    (CTRL+d to exit)
> 
> 
> Will investigate the cause.
> 
> Sorry, but please drop the series and kick the CI again.

FTR I think some CI iterations survived the boot and hit the following,
in several forwarding tests (i.e. router-multipath-sh)

[  922.307796][ T6194] =============================
[  922.308069][ T6194] WARNING: suspicious RCU usage
[  922.308339][ T6194] 6.15.0-rc5-virtme #1 Not tainted
[  922.308596][ T6194] -----------------------------
[  922.308860][ T6194] ./include/net/addrconf.h:347 suspicious
rcu_dereference_check() usage!
[  922.309352][ T6194]
[  922.309352][ T6194] other info that might help us debug this:
[  922.309352][ T6194]
[  922.310105][ T6194]
[  922.310105][ T6194] rcu_scheduler_active = 2, debug_locks = 1
[  922.310501][ T6194] 1 lock held by ip/6194:
[  922.310704][ T6194]  #0: ffff888012942630
(&tb->tb6_lock){+...}-{3:3}, at: ip6_route_multipath_add+0x743/0x1450
[  922.311255][ T6194]
[  922.311255][ T6194] stack backtrace:
[  922.311577][ T6194] CPU: 1 UID: 0 PID: 6194 Comm: ip Not tainted
6.15.0-rc5-virtme #1 PREEMPT(full)
[  922.311583][ T6194] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  922.311585][ T6194] Call Trace:
[  922.311589][ T6194]  <TASK>
[  922.311591][ T6194]  dump_stack_lvl+0xb0/0xd0
[  922.311605][ T6194]  lockdep_rcu_suspicious+0x166/0x270
[  922.311619][ T6194]  rt6_multipath_rebalance.part.0+0x70c/0x8a0
[  922.311628][ T6194]  fib6_add_rt2node+0xa36/0x2c00
[  922.311668][ T6194]  fib6_add+0x38d/0xec0
[  922.311699][ T6194]  ip6_route_multipath_add+0x75b/0x1450
[  922.311753][ T6194]  inet6_rtm_newroute+0xb2/0x120
[  922.311795][ T6194]  rtnetlink_rcv_msg+0x710/0xc00
[  922.311819][ T6194]  netlink_rcv_skb+0x12f/0x360
[  922.311869][ T6194]  netlink_unicast+0x449/0x710
[  922.311891][ T6194]  netlink_sendmsg+0x721/0xbe0
[  922.311922][ T6194]  ____sys_sendmsg+0x7aa/0xa10
[  922.311954][ T6194]  ___sys_sendmsg+0xed/0x170
[  922.312031][ T6194]  __sys_sendmsg+0x108/0x1a0
[  922.312061][ T6194]  do_syscall_64+0xc1/0x1d0
[  922.312069][ T6194]  entry_SYSCALL_64_after_hwframe+0x77/0x7f
[  922.312074][ T6194] RIP: 0033:0x7f8e77c649a7
[  922.312078][ T6194] Code: 0a 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff
eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00
00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89
74 24 10
[  922.312081][ T6194] RSP: 002b:00007ffd73480708 EFLAGS: 00000246
ORIG_RAX: 000000000000002e
[  922.312086][ T6194] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
00007f8e77c649a7
[  922.312088][ T6194] RDX: 0000000000000000 RSI: 00007ffd73480770 RDI:
0000000000000005
[  922.312090][ T6194] RBP: 00007ffd73480abc R08: 0000000000000038 R09:
0000000000000000
[  922.312092][ T6194] R10: 000000000b9c6910 R11: 0000000000000246 R12:
00007ffd73481a80
[  922.312094][ T6194] R13: 00000000682562aa R14: 0000000000498600 R15:
00007ffd7348499b
[  922.312108][ T6194]  </TASK>

see:

https://netdev.bots.linux.dev/contest.html?branch=net-next-2025-05-15--03-00&executor=vmksft-forwarding-dbg&pw-n=0&pass=0

Thanks,

Paolo


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ