linux-kernel - [Syzkaller & bisect] There is "task hung in synchronize

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZBG4HOCQIlGFFcIn@xpf.sh.intel.com>
Date:   Wed, 15 Mar 2023 20:20:44 +0800
From:   Pengfei Xu <pengfei.xu@...el.com>
To:     <lihuafei1@...wei.com>
CC:     <rostedt@...dmis.org>, <linux-kernel@...r.kernel.org>,
        <lkp@...el.com>, <frederic@...nel.org>, <quic_neeraju@...cinc.com>,
        <paulmck@...nel.org>, <heng.su@...el.com>
Subject: [Syzkaller & bisect] There is "task hung in synchronize_rcu bisect"
 in v6.3-rc2 kernel

Hi Li Huafei and kernel experts,

Greeting!

Platform: x86 platforms
There is "task hung in synchronize_rcu bisect" in v6.3-rc2 kernel in guest:

All detailed info: https://github.com/xupengfe/syzkaller_logs/tree/main/230315_023443_synchronize_rcu
Reproduced code: https://github.com/xupengfe/syzkaller_logs/blob/main/230315_023443_synchronize_rcu/repro.c
Kconfig: https://github.com/xupengfe/syzkaller_logs/blob/main/230315_023443_synchronize_rcu/kconfig_origin
Bisect info: https://github.com/xupengfe/syzkaller_logs/blob/main/230315_023443_synchronize_rcu/bisect_info.log
v6.3-rc2 issue dmesg: https://github.com/xupengfe/syzkaller_logs/blob/main/230315_023443_synchronize_rcu/eeac8ede17557680855031c6f305ece2378af326_dmesg.log
"
[   24.844767] memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=330 'systemd'
[   31.392668] hrtimer: interrupt took 10726570 ns
[   62.552360] rcu_tasks_wait_gp: rcu_tasks grace period number 9 (since boot) is 2942 jiffies old.
[   92.760386] rcu_tasks_wait_gp: rcu_tasks grace period number 9 (since boot) is 10494 jiffies old.
[  182.808389] rcu_tasks_wait_gp: rcu_tasks grace period number 9 (since boot) is 33006 jiffies old.
[  300.120396] INFO: task repro:540 blocked for more than 147 seconds.
[  300.120937]       Not tainted 6.3.0-rc2-eeac8ede1755+ #1
[  300.121351] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  300.121888] task:repro           state:D stack:0     pid:540   ppid:538    flags:0x00004006
[  300.122552] Call Trace:
[  300.122835]  <TASK>
[  300.123267]  __schedule+0x40a/0xc90
[  300.124351]  ? wait_for_completion+0x7b/0x180
[  300.124772]  schedule+0x5b/0xe0
[  300.125156]  schedule_timeout+0x50b/0x670
[  300.125829]  ? schedule_timeout+0x9/0x670
[  300.126414]  ? wait_for_completion+0x7b/0x180
[  300.126826]  wait_for_completion+0xa6/0x180
[  300.127478]  __wait_rcu_gp+0x136/0x160
[  300.127922]  ? __pfx_arch_ftrace_ops_list_func+0x10/0x10
[  300.128639]  synchronize_rcu_tasks_generic.part.24+0x3b/0x60
[  300.129092]  ? __pfx_call_rcu_tasks+0x10/0x10
[  300.129499]  ? __pfx_wakeme_after_rcu+0x10/0x10
[  300.130115]  ? verify_cpu+0x10/0x100
[  300.130574]  synchronize_rcu_tasks_generic+0x24/0x70
[  300.131020]  synchronize_rcu_tasks+0x19/0x20
[  300.131438]  ftrace_shutdown+0x1cc/0x410
[  300.132206]  unregister_ftrace_function+0x35/0x230
[  300.132722]  ? __sanitizer_cov_trace_switch+0x57/0xa0
[  300.133311]  perf_ftrace_event_register+0x95/0xf0
[  300.133750]  ? __pfx_tp_perf_event_destroy+0x10/0x10
[  300.134190]  perf_trace_destroy+0x3a/0xa0
[  300.134590]  ? __pfx_tp_perf_event_destroy+0x10/0x10
[  300.135028]  tp_perf_event_destroy+0x1e/0x30
[  300.135473]  _free_event+0x101/0x810
[  300.136191]  put_event+0x3c/0x50
[  300.136605]  perf_event_release_kernel+0x2de/0x360
[  300.137011]  ? perf_event_release_kernel+0x9/0x360
[  300.137577]  ? __pfx_perf_release+0x10/0x10
[  300.138031]  perf_release+0x22/0x30
[  300.138434]  __fput+0x11f/0x450
[  300.139038]  ____fput+0x1e/0x30
[  300.139425]  task_work_run+0xb6/0x120
[  300.140164]  do_exit+0x547/0x1360
[  300.140632]  ? write_comp_data+0x2f/0x90
[  300.141223]  do_group_exit+0x5e/0xf0
[  300.141732]  get_signal+0x15d1/0x1600
[  300.142596]  arch_do_signal_or_restart+0x33/0x280
[  300.143463]  exit_to_user_mode_prepare+0x13b/0x210
[  300.143955]  syscall_exit_to_user_mode+0x2d/0x60
[  300.144623]  do_syscall_64+0x4a/0x90
[  300.145032]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[  300.145459] RIP: 0033:0x7f2a7abd859d
[  300.145823] RSP: 002b:00007ffcc732b338 EFLAGS: 00000246 ORIG_RAX: 000000000000012a
[  300.146429] RAX: 0000000000000003 RBX: 0000000000000000 RCX: 00007f2a7abd859d
[  300.146930] RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 00000000200004c0
[  300.147436] RBP: 00007ffcc732b350 R08: 0000000000000000 R09: 00007ffcc732b350
[  300.147938] R10: 00000000ffffffff R11: 0000000000000246 R12: 00000000004011e0
[  300.148645] R13: 00007ffcc732b470 R14: 0000000000000000 R15: 0000000000000000
[  300.149713]  </TASK>
[  300.149978] 
[  300.149978] Showing all locks held in the system:
[  300.150435] 1 lock held by rcu_tasks_kthre/11:
[  300.150808]  #0: ffffffff83f63490 (rcu_tasks.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x31/0x420
[  300.151849] 1 lock held by rcu_tasks_rude_/12:
[  300.152593]  #0: ffffffff83f63210 (rcu_tasks_rude.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x31/0x420
[  300.153645] 1 lock held by rcu_tasks_trace/13:
[  300.154013]  #0: ffffffff83f62f50 (rcu_tasks_trace.tasks_gp_mutex){+.+.}-{3:3}, at: rcu_tasks_one_gp+0x31/0x420
[  300.155078] 1 lock held by khungtaskd/29:
[  300.155425]  #0: ffffffff83f63ea0 (rcu_read_lock){....}-{1:2}, at: debug_show_all_locks+0x1b/0x1e0
[  300.156644] 2 locks held by repro/540:
[  300.156977]  #0: ffffffff840206a8 (event_mutex){+.+.}-{3:3}, at: perf_trace_destroy+0x21/0xa0
[  300.157971]  #1: ffffffff8401cd68 (ftrace_lock){+.+.}-{3:3}, at: unregister_ftrace_function+0x2b/0x230
[  300.159005] 
[  300.159233] =============================================
[  300.159233] 
"
Bisected and found first bad commit:
"
0e792b89e6800cd9cb4757a76a96f7ef3e8b6294
ftrace: Fix use-after-free for dynamic ftrace_ops
"
After reverted the above commit on top of v6.3-rc2 kernel, this issue was gone.

And this issue could be reproduced in 2100s.

There was no this issue report in syzbot link:
https://syzkaller.appspot.com/upstream

I hope it's helpful.

Thanks!

---

If you don't need the following environment to reproduce the problem or if you
already have one, please ignore the following information.

How to reproduce:
git clone https://gitlab.com/xupengfe/repro_vm_env.git
cd repro_vm_env
tar -xvf repro_vm_env.tar.gz
cd repro_vm_env; ./start3.sh  // it needs qemu-system-x86_64 and I used v7.1.0
   // start3.sh will load bzImage_2241ab53cbb5cdb08a6b2d4688feb13971058f65 v6.2-rc5 kernel
   // You could change the bzImage_xxx as you want
You could use below command to log in, there is no password for root.
ssh -p 10023 root@...alhost

After login vm(virtual machine) successfully, you could transfer reproduced
binary to the vm by below way, and reproduce the problem in vm:
gcc -pthread -o repro repro.c
scp -P 10023 repro root@...alhost:/root/

Get the bzImage for target kernel:
Please use target kconfig and copy it to kernel_src/.config
make olddefconfig
make -jx bzImage           //x should equal or less than cpu num your pc has

Fill the bzImage file into above start3.sh to load the target kernel in vm.


Tips:
If you already have qemu-system-x86_64, please ignore below info.
If you want to install qemu v7.1.0 version:
git clone https://github.com/qemu/qemu.git
cd qemu
git checkout -f v7.1.0
mkdir build
cd build
yum install -y ninja-build.x86_64
../configure --target-list=x86_64-softmmu --enable-kvm --enable-vnc --enable-gtk --enable-sdl
make
make install

Thanks!
BR.