linux-kernel - Re: [Syzkaller & bisect] There is "task hung in synchronize

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZDdsz+5/QehZ25hg@xpf.sh.intel.com>
Date:   Thu, 13 Apr 2023 10:45:35 +0800
From:   Pengfei Xu <pengfei.xu@...el.com>
To:     <lihuafei1@...wei.com>
CC:     <rostedt@...dmis.org>, <linux-kernel@...r.kernel.org>,
        <lkp@...el.com>, <frederic@...nel.org>, <quic_neeraju@...cinc.com>,
        <paulmck@...nel.org>, <heng.su@...el.com>
Subject: Re: [Syzkaller & bisect] There is "task hung in synchronize_rcu
 bisect" in v6.3-rc2 kernel

Hi Huafei and kernel experts,

It's a soft remind.
This issue could be reproduced in v6.3-rc6 kernel.
It could be reproduced on Alder lake, Raptor lake and so on x86 platforms.
After reverted the commit "0e792b89e6800c:ftrace: Fix use-after-free for
dynamic ftrace_ops" on top of v6.3-rc6 kernel, this issue was gone.

New syzkaller reproduced code, repro.report, bisect_info.log and detailed logs
are in link: https://github.com/xupengfe/syzkaller_logs/tree/main/230412_031722_synchronize_rcu

Thanks!
BR.

On 2023-03-15 at 20:20:44 +0800, Pengfei Xu wrote:
> Hi Li Huafei and kernel experts,
> 
> Greeting!
> 
> Platform: x86 platforms
> There is "task hung in synchronize_rcu bisect" in v6.3-rc2 kernel in guest:
> 
> All detailed info: https://github.com/xupengfe/syzkaller_logs/tree/main/230315_023443_synchronize_rcu
> Reproduced code: https://github.com/xupengfe/syzkaller_logs/blob/main/230315_023443_synchronize_rcu/repro.c
> Kconfig: https://github.com/xupengfe/syzkaller_logs/blob/main/230315_023443_synchronize_rcu/kconfig_origin
> Bisect info: https://github.com/xupengfe/syzkaller_logs/blob/main/230315_023443_synchronize_rcu/bisect_info.log
> v6.3-rc2 issue dmesg: https://github.com/xupengfe/syzkaller_logs/blob/main/230315_023443_synchronize_rcu/eeac8ede17557680855031c6f305ece2378af326_dmesg.log
> "
> [   24.844767] memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=330 'systemd'
> [   31.392668] hrtimer: interrupt took 10726570 ns
> [   62.552360] rcu_tasks_wait_gp: rcu_tasks grace period number 9 (since boot) is 2942 jiffies old.
> [   92.760386] rcu_tasks_wait_gp: rcu_tasks grace period number 9 (since boot) is 10494 jiffies old.
> [  182.808389] rcu_tasks_wait_gp: rcu_tasks grace period number 9 (since boot) is 33006 jiffies old.
> [  300.120396] INFO: task repro:540 blocked for more than 147 seconds.
> [  300.120937]       Not tainted 6.3.0-rc2-eeac8ede1755+ #1
> [  300.121351] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  300.121888] task:repro           state:D stack:0     pid:540   ppid:538    flags:0x00004006
> [  300.122552] Call Trace:
> [  300.122835]  <TASK>
> [  300.123267]  __schedule+0x40a/0xc90
> [  300.124351]  ? wait_for_completion+0x7b/0x180
> [  300.124772]  schedule+0x5b/0xe0
> [  300.125156]  schedule_timeout+0x50b/0x670
> [  300.125829]  ? schedule_timeout+0x9/0x670
> [  300.126414]  ? wait_for_completion+0x7b/0x180
> [  300.126826]  wait_for_completion+0xa6/0x180
> [  300.127478]  __wait_rcu_gp+0x136/0x160
> [  300.127922]  ? __pfx_arch_ftrace_ops_list_func+0x10/0x10
> [  300.128639]  synchronize_rcu_tasks_generic.part.24+0x3b/0x60
> [  300.129092]  ? __pfx_call_rcu_tasks+0x10/0x10
> [  300.129499]  ? __pfx_wakeme_after_rcu+0x10/0x10
> [  300.130115]  ? verify_cpu+0x10/0x100
> [  300.130574]  synchronize_rcu_tasks_generic+0x24/0x70
> [  300.131020]  synchronize_rcu_tasks+0x19/0x20
> [  300.131438]  ftrace_shutdown+0x1cc/0x410
> [  300.132206]  unregister_ftrace_function+0x35/0x230
> [  300.132722]  ? __sanitizer_cov_trace_switch+0x57/0xa0
> [  300.133311]  perf_ftrace_event_register+0x95/0xf0
> [  300.133750]  ? __pfx_tp_perf_event_destroy+0x10/0x10
> [  300.134190]  perf_trace_destroy+0x3a/0xa0
> [  300.134590]  ? __pfx_tp_perf_event_destroy+0x10/0x10
> [  300.135028]  tp_perf_event_destroy+0x1e/0x30
> [  300.135473]  _free_event+0x101/0x810
> [  300.136191]  put_event+0x3c/0x50
> [  300.136605]  perf_event_release_kernel+0x2de/0x360
> [  300.137011]  ? perf_event_release_kernel+0x9/0x360
> [  300.137577]  ? __pfx_perf_release+0x10/0x10
> [  300.138031]  perf_release+0x22/0x30
> [  300.138434]  __fput+0x11f/0x450
> [  300.139038]  ____fput+0x1e/0x30
> [  300.139425]  task_work_run+0xb6/0x120
> [  300.140164]  do_exit+0x547/0x1360
> [  300.140632]  ? write_comp_data+0x2f/0x90
> [  300.141223]  do_group_exit+0x5e/0xf0
> [  300.141732]  get_signal+0x15d1/0x1600
> [  300.142596]  arch_do_signal_or_restart+0x33/0x280
> [  300.143463]  exit_to_user_mode_prepare+0x13b/0x210
> [  300.143955]  syscall_exit_to_user_mode+0x2d/0x60
> [  300.144623]  do_syscall_64+0x4a/0x90
> [  300.145032]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
> [  300.145459] RIP: 0033:0x7f2a7abd859d
> [  300.145823] RSP: 002b:00007ffcc732b338 EFLAGS: 00000246 ORIG_RAX: 000000000000012a
> [  300.146429] RAX: 0000000000000003 RBX: 0000000000000000 RCX: 00007f2a7abd859d
> [  300.146930] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 00000000200004c0
> [  300.147436] RBP: 00007ffcc732b350 R08: 0000000000000000 R09: 00007ffcc732b350
> [  300.147938] R10: 00000000ffffffff R11: 0000000000000246 R12: 00000000004011e0
> [  300.148645] R13: 00007ffcc732b470 R14: 0000000000000000 R15: 0000000000000000
> [  300.149713]  </TASK>
> [  300.149978] 
> [  300.149978] Showing all locks held in the system:
> [  300.150435] 1 lock held by rcu_tasks_kthre/11:
> [  300.150808]  #0: ffffffff83f63490 (rcu_tasks.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x31/0x420
> [  300.151849] 1 lock held by rcu_tasks_rude_/12:
> [  300.152593]  #0: ffffffff83f63210 (rcu_tasks_rude.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x31/0x420
> [  300.153645] 1 lock held by rcu_tasks_trace/13:
> [  300.154013]  #0: ffffffff83f62f50 (rcu_tasks_trace.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x31/0x420
> [  300.155078] 1 lock held by khungtaskd/29:
> [  300.155425]  #0: ffffffff83f63ea0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x1b/0x1e0
> [  300.156644] 2 locks held by repro/540:
> [  300.156977]  #0: ffffffff840206a8 (event_mutex){+.+.}-{3:3}, at: perf_trace_destroy+0x21/0xa0
> [  300.157971]  #1: ffffffff8401cd68 (ftrace_lock){+.+.}-{3:3}, at: unregister_ftrace_function+0x2b/0x230
> [  300.159005] 
> [  300.159233] =============================================
> [  300.159233] 
> "
> Bisected and found first bad commit:
> "
> 0e792b89e6800cd9cb4757a76a96f7ef3e8b6294
> ftrace: Fix use-after-free for dynamic ftrace_ops
> "
> After reverted the above commit on top of v6.3-rc2 kernel, this issue was gone.
> 
> And this issue could be reproduced in 2100s.
> 
> There was no this issue report in syzbot link:
> https://syzkaller.appspot.com/upstream
> 
> I hope it's helpful.
> 
> Thanks!
> 
> ---
> 
> If you don't need the following environment to reproduce the problem or if you
> already have one, please ignore the following information.
> 
> How to reproduce:
> git clone https://gitlab.com/xupengfe/repro_vm_env.git
> cd repro_vm_env
> tar -xvf repro_vm_env.tar.gz
> cd repro_vm_env; ./start3.sh  // it needs qemu-system-x86_64 and I used v7.1.0
>    // start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel
>    // You could change the bzImage_xxx as you want
> You could use below command to log in, there is no password for root.
> ssh -p 10023 root@...alhost
> 
> After login vm(virtual machine) successfully, you could transfer reproduced
> binary to the vm by below way, and reproduce the problem in vm:
> gcc -pthread -o repro repro.c
> scp -P 10023 repro root@...alhost:/root/
> 
> Get the bzImage for target kernel:
> Please use target kconfig and copy it to kernel_src/.config
> make olddefconfig
> make -jx bzImage           //x should equal or less than cpu num your pc has
> 
> Fill the bzImage file into above start3.sh to load the target kernel in vm.
> 
> 
> Tips:
> If you already have qemu-system-x86_64, please ignore below info.
> If you want to install qemu v7.1.0 version:
> git clone https://github.com/qemu/qemu.git
> cd qemu
> git checkout -f v7.1.0
> mkdir build
> cd build
> yum install -y ninja-build.x86_64
> ../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl
> make
> make install
> 
> Thanks!
> BR.