lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fa56b39990dd0b90f971018f5abb7352c60af3b1.camel@linux.ibm.com>
Date: Tue, 20 Aug 2024 12:33:34 +0530
From: Aboorva Devarajan <aboorvad@...ux.ibm.com>
To: Tejun Heo <tj@...nel.org>
Cc: void@...ifault.com, linux-kernel@...r.kernel.org, aboorvad@...ux.ibm.com
Subject: Re: [sched_ext/for-6.11]: Issue with BPF Scheduler during CPU
 Hotplug

On Tue, 2024-08-13 at 09:54 -1000, Tejun Heo wrote:
> Hello,
> 
> On Sat, Aug 10, 2024 at 11:54:24PM +0530, Aboorva Devarajan wrote:
> ...
> ...

> Can you trigger sysrq-t when the system is stuck? Also, can you see whether
> the problem is reproducible on x86 w/ the wq changes applied?

Hi Tejun,

I couldn't trigger sysrq-t so far as the system becomes unresponsive,
will get back if I can successfully trigger this via console. 

And yes, this issue also occurs on x86 when applying the proposed workqueue
patch on top of the recent sched-ext for-6.12 tree. However, it takes
significantly longer to hit the issue, and the stack trace differs
a bit.

Here is the trace on x86:
...
9.374534] sched_ext: BPF scheduler "simple" disabled (unregistered from the main kernel)
[10779.374809] smpboot: Booting Node 1 Processor 27 APIC 0x27
[10779.594551] sched_ext: BPF scheduler "simple" disabled (unregistered from the main kernel)
[10779.594788] smpboot: Booting Node 1 Processor 28 APIC 0x29
[10779.790551] sched_ext: BPF scheduler "simple" disabled (unregistered from the main kernel)
[10779.790804] smpboot: Booting Node 1 Processor 29 APIC 0x2b
[10779.990527] sched_ext: BPF scheduler "simple" disabled (unregistered from the main kernel)
[10779.990762] smpboot: Booting Node 1 Processor 30 APIC 0x2d
[10780.186553] sched_ext: BPF scheduler "simple" disabled (unregistered from the main kernel)
[10780.198737] smpboot: Booting Node 1 Processor 31 APIC 0x2f
[10780.556624] smpboot: CPU 1 is now offline
[10780.773765] smpboot: CPU 2 is now offline
[10780.984782] smpboot: CPU 3 is now offline
[10781.030533] sched_ext: BPF scheduler "simple" disabled (unregistered from the main kernel)
[10781.224421] smpboot: CPU 4 is now offline
[10781.408442] smpboot: CPU 5 is now offline
[10781.627324] smpboot: CPU 6 is now offline
[10781.824777] smpboot: CPU 7 is now offline
[10781.878540] sched_ext: BPF scheduler "simple" disabled (unregistered from the main kernel)
[10782.088456] smpboot: CPU 8 is now offline
[10782.296472] smpboot: CPU 9 is now offline
[10782.504424] smpboot: CPU 10 is now offline
[10782.696423] smpboot: CPU 11 is now offline
[10782.734601] sched_ext: BPF scheduler "simple" disabled (unregistered from the main kernel)
[10782.948499] smpboot: CPU 12 is now offline
[10999.932491] INFO: task kthreadd:2 blocked for more than 120 seconds.
[10999.939871]       Not tainted 6.11.0-rc1-sched-ext-g2f88de3b86bf #2
[10999.946938] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[10999.955740] task:kthreadd        state:D stack:0     pid:2     tgid:2     ppid:0      flags:0x00004000
[10999.966210] Sched_ext: simple (disabling)
[10999.966220] Call Trace:
[10999.973536]  <TASK>
[10999.975926]  __schedule+0x47e/0x19d0
[10999.979993]  ? __this_cpu_preempt_check+0x17/0x20
[10999.985351]  schedule+0x4b/0x1a0
[10999.989000]  percpu_rwsem_wait+0x118/0x140
[10999.993639]  ? __pfx_percpu_rwsem_wake_function+0x10/0x10
[10999.999745]  __percpu_down_read+0x7b/0xb0
[11000.004255]  scx_pre_fork+0xd1/0xf0
[11000.008194]  sched_fork+0x59/0x1c0
[11000.012034]  copy_process+0xa69/0x2b10
[11000.016279]  ? sched_clock_noinstr+0xd/0x20
[11000.021006]  kernel_clone+0xa0/0x3d0
[11000.025040]  kernel_thread+0x70/0x90
[11000.029099]  ? __pfx_kthread+0x10/0x10
[11000.033340]  kthreadd+0x2f6/0x3a0
[11000.037097]  ? __pfx_kthreadd+0x10/0x10
[11000.041423]  ret_from_fork+0x40/0x60
[11000.045468]  ? __pfx_kthreadd+0x10/0x10
[11000.049800]  ret_from_fork_asm+0x1a/0x30
[11000.054265]  </TASK>
[11000.056781] INFO: task rcu_tasks_trace:15 blocked for more than 120 seconds.
[11000.064719]       Not tainted 6.11.0-rc1-sched-ext-g2f88de3b86bf #2
[11000.071751] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[11000.080538] task:rcu_tasks_trace state:D stack:0     pid:15    tgid:15    ppid:2      flags:0x00004000
[11000.090982] Sched_ext: simple (disabling)
[11000.090986] Call Trace:
[11000.098249]  <TASK>
[11000.100620]  __schedule+0x47e/0x19d0
[11000.104685]  ? __this_cpu_preempt_check+0x17/0x20
[11000.109982]  schedule+0x4b/0x1a0
[11000.113619]  percpu_rwsem_wait+0x118/0x140
[11000.118211]  ? __pfx_percpu_rwsem_wake_function+0x10/0x10
[11000.124325]  __percpu_down_read+0x7b/0xb0
[11000.128836]  cpus_read_lock+0xd1/0xf0
[11000.132964]  rcu_tasks_trace_pregp_step+0xa2/0x700
[11000.138362]  ? __mutex_lock+0x388/0x8e0
[11000.142710]  rcu_tasks_wait_gp+0x5f/0x430
[11000.147208]  ? __this_cpu_preempt_check+0x17/0x20
[11000.152499]  rcu_tasks_one_gp+0x474/0x4e0
[11000.157028]  ? __pfx_rcu_tasks_kthread+0x10/0x10
[11000.162205]  rcu_tasks_kthread+0xb8/0xe0
[11000.166642]  kthread+0xfa/0x130
[11000.170171]  ? __pfx_kthread+0x10/0x10
[11000.174416]  ret_from_fork+0x40/0x60
[11000.178434]  ? __pfx_kthread+0x10/0x10
[11000.182667]  ret_from_fork_asm+0x1a/0x30
[11000.187096]  </TASK>
[11000.189581] INFO: task cpuhp/13:95 blocked for more than 121 seconds.
[11000.196833]       Not tainted 6.11.0-rc1-sched-ext-g2f88de3b86bf #2
[11000.203863] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[11000.212646] task:cpuhp/13        state:D stack:0     pid:95    tgid:95    ppid:2      flags:0x00004000
[11000.223106] Sched_ext: simple (disabling)
[11000.223111] Call Trace:
[11000.230370]  <TASK>
[11000.232777]  __schedule+0x47e/0x19d0
[11000.236807]  ? __this_cpu_preempt_check+0x17/0x20
[11000.242121]  schedule+0x4b/0x1a0
[11000.245758]  schedule_timeout+0x116/0x120
[11000.250280]  ? _raw_spin_unlock_irq+0x2c/0x60
[11000.255176]  ? __this_cpu_preempt_check+0x17/0x20
[11000.260482]  ? lockdep_hardirqs_on+0xce/0x150
[11000.265377]  ? trace_hardirqs_on+0x51/0x60
[11000.270014]  __wait_for_common+0x91/0x180
[11000.274519]  ? __pfx_schedule_timeout+0x10/0x10
[11000.279630]  ? __flush_work+0x232/0x610
[11000.283940]  wait_for_completion+0x28/0x30
[11000.288565]  __flush_work+0x2d4/0x610
[11000.292693]  ? __pfx_wq_barrier_func+0x10/0x10
[11000.297728]  ? __wait_for_common+0x3e/0x180
[11000.302424]  work_on_cpu_key+0x9c/0xc0
[11000.306641]  ? __pfx_work_for_cpu_fn+0x10/0x10
[11000.311629]  ? __pfx_tmigr_trigger_active+0x10/0x10
[11000.317136]  tmigr_cpu_offline+0x1dd/0x260
[11000.321742]  ? __pfx_tmigr_cpu_offline+0x10/0x10
[11000.326935]  cpuhp_invoke_callback+0x159/0x2f0
[11000.331936]  cpuhp_thread_fun+0x1e8/0x280
[11000.336452]  smpboot_thread_fn+0x196/0x240
[11000.341070]  ? __pfx_smpboot_thread_fn+0x10/0x10
[11000.346267]  kthread+0xfa/0x130
[11000.349805]  ? __pfx_kthread+0x10/0x10
[11000.354028]  ret_from_fork+0x40/0x60
[11000.358070]  ? __pfx_kthread+0x10/0x10
[11000.362288]  ret_from_fork_asm+0x1a/0x30
[11000.366740]  </TASK>
[11000.369243] INFO: task systemd-journal:790 blocked for more than 121 seconds.
[11000.377250]       Not tainted 6.11.0-rc1-sched-ext-g2f88de3b86bf #2
[11000.384277] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[11000.393062] task:systemd-journal state:D stack:0     pid:790   tgid:790   ppid:1      flags:0x00000002
[11000.403510] Sched_ext: simple (disabling)
[11000.403514] Call Trace:
[11000.410789]  <TASK>
[11000.413177]  __schedule+0x47e/0x19d0
[11000.417215]  ? __this_cpu_preempt_check+0x17/0x20
[11000.422510]  schedule+0x4b/0x1a0
[11000.426168]  percpu_rwsem_wait+0x118/0x140
[11000.430773]  ? __pfx_percpu_rwsem_wake_function+0x10/0x10
[11000.436846]  __percpu_down_read+0x7b/0xb0
[11000.441361]  scx_pre_fork+0xd1/0xf0
[11000.445284]  sched_fork+0x59/0x1c0
[11000.449117]  copy_process+0xa69/0x2b10
[11000.453346]  ? timerfd_read_iter+0x136/0x290
[11000.458160]  ? sched_clock_noinstr+0xd/0x20
[11000.462862]  ? _copy_to_iter+0xb9/0x630
[11000.467188]  kernel_clone+0xa0/0x3d0
[11000.471208]  ? copy_clone_args_from_user+0xa3/0x290
[11000.476733]  __do_sys_clone3+0xc8/0x110
[11000.481068]  ? __secure_computing+0x89/0xe0
[11000.485785]  __x64_sys_clone3+0x1a/0x20
[11000.490096]  x64_sys_call+0x1800/0x20d0
[11000.494438]  do_syscall_64+0x8b/0x140
[11000.498561]  ? do_syscall_64+0x97/0x140
[11000.502887]  ? __this_cpu_preempt_check+0x17/0x20
[11000.508186]  ? lockdep_hardirqs_on+0xce/0x150
[11000.513096]  ? syscall_exit_to_user_mode+0xd5/0x210
[11000.518568]  ? do_syscall_64+0x97/0x140
[11000.522881]  ? __lock_acquire+0xac8/0x28c0
[11000.527511]  ? sigprocmask+0xad/0xe0
[11000.531526]  ? sched_clock_noinstr+0xd/0x20
[11000.536239]  ? _raw_spin_unlock_irq+0x2c/0x60
[11000.541158]  ? __this_cpu_preempt_check+0x17/0x20
[11000.546431]  ? lockdep_hardirqs_on+0xce/0x150
[11000.551346]  ? do_syscall_64+0x97/0x140
[11000.555656]  ? __this_cpu_preempt_check+0x17/0x20
[11000.560955]  ? lockdep_hardirqs_on+0xce/0x150
[11000.565850]  ? syscall_exit_to_user_mode+0xd5/0x210
[11000.571340]  ? do_syscall_64+0x97/0x140
[11000.575655]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[11000.581333] RIP: 0033:0x7ffa9a32683d
[11000.585368] RSP: 002b:00007ffc04eb0418 EFLAGS: 00000202 ORIG_RAX: 00000000000001b3
[11000.593859] RAX: ffffffffffffffda RBX: 00007ffc04eb0460 RCX: 00007ffa9a32683d
[11000.601849] RDX: 00007ffa9a2947d0 RSI: 0000000000000058 RDI: 00007ffc04eb0460
[11000.609863] RBP: fffffffffffffdc8 R08: 00007ffa97800640 R09: 00007ffc04eb057f
[11000.617868] R10: 0000000000000008 R11: 0000000000000202 R12: 00007ffa97800640
[11000.625863] R13: 0000000000000002 R14: 00007ffa9a2947d0 R15: 00007ffc04eb05c0
[11000.633887]  </TASK>
[11000.636366] INFO: task udisksd:1584 blocked for more than 121 seconds.
[11000.643691]       Not tainted 6.11.0-rc1-sched-ext-g2f88de3b86bf #2
[11000.650730] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[11000.659506] task:udisksd         state:D stack:0     pid:1584  tgid:1584  ppid:1      flags:0x00000002
[11000.669935] Sched_ext: simple (disabling)
[11000.669939] Call Trace:
[11000.677214]  <TASK>
[11000.679584]  __schedule+0x47e/0x19d0
[11000.683620]  ? __this_cpu_preempt_check+0x17/0x20
[11000.688932]  schedule+0x4b/0x1a0
[11000.692578]  percpu_rwsem_wait+0x118/0x140
[11000.697207]  ? __pfx_percpu_rwsem_wake_function+0x10/0x10
[11000.703295]  __percpu_down_read+0x7b/0xb0
[11000.707808]  scx_pre_fork+0xd1/0xf0
[11000.711747]  sched_fork+0x59/0x1c0
[11000.715598]  copy_process+0xa69/0x2b10
[11000.719847]  ? __lock_acquire+0xac8/0x28c0
[11000.724498]  kernel_clone+0xa0/0x3d0
[11000.728515]  ? copy_clone_args_from_user+0xa3/0x290
[11000.734029]  __do_sys_clone3+0xc8/0x110
[11000.738378]  ? do_syscall_64+0x97/0x140
[11000.742712]  ? do_syscall_64+0x4c/0x140
[11000.747036]  ? __this_cpu_preempt_check+0x17/0x20
[11000.752340]  __x64_sys_clone3+0x1a/0x20
[11000.756665]  x64_sys_call+0x1800/0x20d0
[11000.760999]  do_syscall_64+0x8b/0x140
[11000.765146]  ? do_syscall_64+0x97/0x140
[11000.769462]  ? __this_cpu_preempt_check+0x17/0x20
[11000.774754]  ? lockdep_hardirqs_on+0xce/0x150
[11000.779668]  ? syscall_exit_to_user_mode+0xd5/0x210
[11000.785151]  ? do_syscall_64+0x97/0x140
[11000.789459]  ? do_syscall_64+0x97/0x140
[11000.793766]  ? syscall_exit_to_user_mode+0xd5/0x210
[11000.799248]  ? do_syscall_64+0x97/0x140
[11000.803560]  ? do_syscall_64+0x97/0x140
[11000.807878]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[11000.813549] RIP: 0033:0x7f0fe352683d
[11000.817566] RSP: 002b:00007ffed86d7578 EFLAGS: 00000202 ORIG_RAX: 00000000000001b3
[11000.826059] RAX: ffffffffffffffda RBX: 00007ffed86d75c0 RCX: 00007f0fe352683d
[11000.834058] RDX: 00007f0fe34947d0 RSI: 0000000000000058 RDI: 00007ffed86d75c0
[11000.842053] RBP: fffffffffffffea0 R08: 00007f0fe2000640 R09: 00007ffed86d76df
[11000.850054] R10: 0000000000000008 R11: 0000000000000202 R12: 00007f0fe2000640
[11000.858050] R13: 0000000000000002 R14: 00007f0fe34947d0 R15: 00007ffed86d7720
[11000.866079]  </TASK>
[11000.868534] INFO: task cron:1820 blocked for more than 121 seconds.
[11000.875563]       Not tainted 6.11.0-rc1-sched-ext-g2f88de3b86bf #2
[11000.882609] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[11000.891391] task:cron            state:D stack:0     pid:1820  tgid:1820  ppid:1      flags:0x00000002
[11000.901817] Sched_ext: simple (disabling)
[11000.901820] Call Trace:
[11000.909094]  <TASK>
[11000.911466]  __schedule+0x47e/0x19d0
[11000.915490]  ? __this_cpu_preempt_check+0x17/0x20
[11000.920810]  schedule+0x4b/0x1a0
[11000.924449]  percpu_rwsem_wait+0x118/0x140
[11000.929054]  ? __pfx_percpu_rwsem_wake_function+0x10/0x10
[11000.935120]  __percpu_down_read+0x7b/0xb0
[11000.939639]  scx_pre_fork+0xd1/0xf0
[11000.943564]  sched_fork+0x59/0x1c0
[11000.947396]  copy_process+0xa69/0x2b10
[11000.951607]  ? mntput_no_expire+0x9a/0x4c0
[11000.956244]  kernel_clone+0xa0/0x3d0
[11000.960271]  __do_sys_clone+0x66/0x90
[11000.964399]  __x64_sys_clone+0x29/0x30
[11000.968630]  x64_sys_call+0x1d80/0x20d0
[11000.972946]  do_syscall_64+0x8b/0x140
[11000.977062]  ? lockdep_hardirqs_on+0xce/0x150
[11000.981960]  ? syscall_exit_to_user_mode+0xd5/0x210
[11000.987440]  ? do_syscall_64+0x97/0x140
[11000.991773]  ? __do_sys_newfstatat+0x3e/0x60
[11000.996583]  ? do_syscall_64+0x97/0x140
[11001.000897]  ? __this_cpu_preempt_check+0x17/0x20
[11001.006189]  ? lockdep_hardirqs_on+0xce/0x150
[11001.011088]  ? syscall_exit_to_user_mode+0xd5/0x210
[11001.016577]  ? do_syscall_64+0x97/0x140
[11001.020892]  ? syscall_exit_to_user_mode+0xd5/0x210
[11001.026372]  ? do_syscall_64+0x97/0x140
[11001.030689]  ? syscall_exit_to_user_mode+0xd5/0x210
[11001.036178]  ? do_syscall_64+0x97/0x140
[11001.040496]  ? do_syscall_64+0x97/0x140
[11001.044811]  ? syscall_exit_to_user_mode+0xd5/0x210
[11001.050290]  ? do_syscall_64+0x97/0x140
[11001.054605]  ? do_syscall_64+0x97/0x140
[11001.058929]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[11001.064608] RIP: 0033:0x7fc230eeab57
[11001.068633] RSP: 002b:00007fff0ef6db58 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
[11001.077128] RAX: ffffffffffffffda RBX: 00007fc231195040 RCX: 00007fc230eeab57
[11001.085123] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
[11001.093120] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000020
[11001.101126] R10: 00007fc231031b10 R11: 0000000000000246 R12: 0000000000000001
[11001.109121] R13: 00005564ce386af0 R14: 000055649c0af0a1 R15: 00005564ce36dd00
[11001.117134]  </TASK>
[11001.119642] INFO: task scx_simple:3632 blocked for more than 122 seconds.
[11001.127291]       Not tainted 6.11.0-rc1-sched-ext-g2f88de3b86bf #2
[11001.134342] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[11001.143146] task:scx_simple      state:D stack:0     pid:3632  tgid:3632  ppid:3619   flags:0x00004002
[11001.153597] Sched_ext: simple (disabling)
[11001.153609] Call Trace:
[11001.160883]  <TASK>
[11001.163242]  __schedule+0x47e/0x19d0
[11001.167297]  ? __this_cpu_preempt_check+0x17/0x20
[11001.172592]  schedule+0x4b/0x1a0
[11001.176200]  percpu_rwsem_wait+0x118/0x140
[11001.180863]  ? __pfx_percpu_rwsem_wake_function+0x10/0x10
[11001.186919]  __percpu_down_read+0x7b/0xb0
[11001.191468]  cpus_read_lock+0xd1/0xf0
[11001.195586]  scx_ops_enable.constprop.0+0x341/0xae0
[11001.201105]  ? __this_cpu_preempt_check+0x17/0x20
[11001.206397]  ? debug_smp_processor_id+0x1b/0x30
[11001.211519]  ? rcu_is_watching+0x17/0x70
[11001.215954]  ? trace_contention_end+0x6c/0xa0
[11001.220871]  ? __mutex_lock+0x388/0x8e0
[11001.225190]  ? __anon_inode_getfile+0x8b/0x180
[11001.230217]  ? bpf_struct_ops_link_create+0x105/0x180
[11001.235896]  ? lockdep_init_map_type+0x5a/0x290
[11001.241016]  bpf_scx_reg+0x12/0x20
[11001.244859]  bpf_struct_ops_link_create+0x124/0x180
[11001.250375]  __sys_bpf+0x2088/0x2b90
[11001.254411]  ? mutex_unlock+0x16/0x20
[11001.258604]  __x64_sys_bpf+0x1e/0x30
[11001.262614]  x64_sys_call+0x1b2e/0x20d0
[11001.266966]  do_syscall_64+0x8b/0x140
[11001.271089]  ? map_update_elem+0x252/0x310
[11001.275737]  ? __sys_bpf+0x12a2/0x2b90
[11001.279949]  ? do_syscall_64+0x97/0x140
[11001.284328]  ? __rseq_handle_notify_resume+0x440/0x630
[11001.290138]  ? do_syscall_64+0x97/0x140
[11001.294490]  ? __this_cpu_preempt_check+0x17/0x20
[11001.299766]  ? lockdep_hardirqs_on+0xce/0x150
[11001.304688]  ? syscall_exit_to_user_mode+0xd5/0x210
[11001.310173]  ? do_syscall_64+0x97/0x140
[11001.314500]  ? sysvec_apic_timer_interrupt+0x76/0xf0
[11001.320109]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[11001.325807] RIP: 0033:0x7f1e1911e88d
[11001.329816] RSP: 002b:00007ffdb1b0ed08 EFLAGS: 00000202 ORIG_RAX: 0000000000000141
[11001.338343] RAX: ffffffffffffffda RBX: 00000000ffffffff RCX: 00007f1e1911e88d
[11001.346368] RDX: 0000000000000040 RSI: 00007ffdb1b0edf0 RDI: 000000000000001c
[11001.354379] RBP: 00007ffdb1b0ed20 R08: 00007ffdb1b0edf0 R09: 00007ffdb1b0edf0
[11001.362395] R10: 000055722ade9500 R11: 0000000000000202 R12: 00005571f595a2cd
[11001.370419] R13: 00005571f595a200 R14: 00007ffdb1b0ef28 R15: 000055722ae06f50
[11001.378447]  </TASK>
[11001.380974] INFO: task sched_ext_ops_h:3633 blocked for more than 122 seconds.
[11001.389131]       Not tainted 6.11.0-rc1-sched-ext-g2f88de3b86bf #2
[11001.396167] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[11001.404971] task:sched_ext_ops_h state:D stack:0     pid:3633  tgid:3633  ppid:2      flags:0x00004000
[11001.415378] Sched_ext: simple (disabling)
[11001.415381] Call Trace:
[11001.422636]  <TASK>
[11001.425007]  __schedule+0x47e/0x19d0
[11001.429051]  ? __this_cpu_preempt_check+0x17/0x20
[11001.434342]  schedule+0x4b/0x1a0
[11001.437986]  schedule_preempt_disabled+0x1c/0x30
[11001.443162]  __mutex_lock+0x50a/0x8e0
[11001.447320]  ? scx_ops_disable_workfn+0xe0/0x580
[11001.452525]  mutex_lock_nested+0x1f/0x30
[11001.456916]  ? mutex_lock_nested+0x1f/0x30
[11001.461529]  scx_ops_disable_workfn+0xe0/0x580
[11001.466538]  ? __this_cpu_preempt_check+0x17/0x20
[11001.471834]  ? kthread_worker_fn+0x76/0x2e0
[11001.476506]  ? sched_clock_noinstr+0xd/0x20
[11001.481231]  ? _raw_spin_unlock_irq+0x2c/0x60
[11001.486138]  ? __this_cpu_preempt_check+0x17/0x20
[11001.491422]  kthread_worker_fn+0xbb/0x2e0
[11001.495911]  ? __pfx_scx_ops_disable_workfn+0x10/0x10
[11001.501593]  ? __pfx_kthread_worker_fn+0x10/0x10
[11001.506770]  kthread+0xfa/0x130
[11001.510321]  ? __pfx_kthread+0x10/0x10
[11001.514545]  ret_from_fork+0x40/0x60
[11001.518573]  ? __pfx_kthread+0x10/0x10
[11001.522780]  ret_from_fork_asm+0x1a/0x30
[11001.527229]  </TASK>
[11001.529701] INFO: task bash:3660 blocked for more than 122 seconds.
[11001.536731]       Not tainted 6.11.0-rc1-sched-ext-g2f88de3b86bf #2
[11001.543746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[11001.552539] task:bash            state:D stack:0     pid:3660  tgid:3660  ppid:3640   flags:0x00004002
[11001.562963] Sched_ext: simple (disabling)
[11001.562966] Call Trace:
[11001.570197]  <TASK>
[11001.572571]  __schedule+0x47e/0x19d0
[11001.576603]  ? __this_cpu_preempt_check+0x17/0x20
[11001.581895]  schedule+0x4b/0x1a0
[11001.585535]  schedule_timeout+0x116/0x120
[11001.590054]  ? _raw_spin_unlock_irq+0x2c/0x60
[11001.594938]  ? __this_cpu_preempt_check+0x17/0x20
[11001.600242]  ? lockdep_hardirqs_on+0xce/0x150
[11001.605129]  ? trace_hardirqs_on+0x51/0x60
[11001.609735]  __wait_for_common+0x91/0x180
[11001.614261]  ? __pfx_schedule_timeout+0x10/0x10
[11001.619343]  ? __flush_work+0x232/0x610
[11001.623656]  wait_for_completion+0x28/0x30
[11001.628285]  __flush_work+0x2d4/0x610
[11001.632410]  ? __pfx_wq_barrier_func+0x10/0x10
[11001.637427]  ? __wait_for_common+0x3e/0x180
[11001.642134]  work_on_cpu_key+0x9c/0xc0
[11001.646361]  ? __pfx_work_for_cpu_fn+0x10/0x10
[11001.651337]  ? __pfx___cpu_down_maps_locked+0x10/0x10
[11001.657013]  cpu_down_maps_locked+0xbf/0xd0
[11001.661737]  cpu_device_down+0x2e/0x50
[11001.665966]  cpu_subsys_offline+0x12/0x20
[11001.670472]  device_offline+0xf0/0x120
[11001.674732]  online_store+0x64/0xe0
[11001.678684]  dev_attr_store+0x1b/0x30
[11001.682824]  sysfs_kf_write+0x4f/0x70
[11001.686975]  kernfs_fop_write_iter+0x172/0x230
[11001.691988]  vfs_write+0x378/0x540
[11001.695841]  ksys_write+0x71/0xf0
[11001.699585]  __x64_sys_write+0x1d/0x30
[11001.703786]  x64_sys_call+0x1685/0x20d0
[11001.708122]  do_syscall_64+0x8b/0x140
[11001.712246]  ? do_syscall_64+0x97/0x140
[11001.716539]  ? __this_cpu_preempt_check+0x17/0x20
[11001.721819]  ? lockdep_hardirqs_on+0xce/0x150
[11001.726734]  ? syscall_exit_to_user_mode+0xd5/0x210
[11001.732205]  ? do_syscall_64+0x97/0x140
[11001.736517]  ? lockdep_hardirqs_on+0xce/0x150
[11001.741432]  ? syscall_exit_to_user_mode+0xd5/0x210
[11001.746899]  ? do_syscall_64+0x97/0x140
[11001.751209]  ? lockdep_hardirqs_on+0xce/0x150
[11001.756105]  ? syscall_exit_to_user_mode+0xd5/0x210
[11001.761608]  ? do_syscall_64+0x97/0x140
[11001.765901]  ? syscall_exit_to_user_mode+0xd5/0x210
[11001.771371]  ? do_syscall_64+0x97/0x140
[11001.775689]  ? do_syscall_64+0x97/0x140
[11001.780019]  ? irqentry_exit+0x6f/0xa0
[11001.784217]  ? sysvec_call_function+0x76/0xf0
[11001.789114]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[11001.794793] RIP: 0033:0x7f48b4914887
[11001.798807] RSP: 002b:00007ffc30aa2b48 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[11001.807309] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f48b4914887
[11001.815287] RDX: 0000000000000002 RSI: 00005593f11663b0 RDI: 0000000000000001
[11001.823275] RBP: 00005593f11663b0 R08: 00007f48b49d1460 R09: 000000007fffffff
[11001.831270] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
[11001.839262] R13: 00007f48b4a1b780 R14: 00007f48b4a17600 R15: 00007f48b4a16a00
[11001.847287]  </TASK>
[11001.849780] INFO: task kworker/0:1:8931 blocked for more than 122 seconds.
[11001.857471]       Not tainted 6.11.0-rc1-sched-ext-g2f88de3b86bf #2
[11001.864516] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[11001.873290] task:kworker/0:1     state:D stack:0     pid:8931  tgid:8931  ppid:2      flags:0x00004000
[11001.883720] Workqueue: events vmstat_shepherd
[11001.888608] Sched_ext: simple (disabling)
[11001.888630] Call Trace:
[11001.895884]  <TASK>
[11001.898248]  __schedule+0x47e/0x19d0
[11001.902293]  ? __this_cpu_preempt_check+0x17/0x20
[11001.907598]  schedule+0x4b/0x1a0
[11001.911223]  percpu_rwsem_wait+0x118/0x140
[11001.915836]  ? __pfx_percpu_rwsem_wake_function+0x10/0x10
[11001.921910]  __percpu_down_read+0x7b/0xb0
[11001.926409]  cpus_read_lock+0xd1/0xf0
[11001.930536]  vmstat_shepherd+0x20/0x120
[11001.934863]  process_one_work+0x21e/0x680
[11001.939358]  ? __this_cpu_preempt_check+0x17/0x20
[11001.944672]  worker_thread+0x194/0x340
[11001.948888]  ? __pfx_worker_thread+0x10/0x10
[11001.953689]  kthread+0xfa/0x130
[11001.957226]  ? __pfx_kthread+0x10/0x10
[11001.961446]  ret_from_fork+0x40/0x60
[11001.965465]  ? __pfx_kthread+0x10/0x10
[11001.969679]  ret_from_fork_asm+0x1a/0x30
[11001.974106]  </TASK>
[11001.976563] Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings
[11001.985786] 
[11001.985786] Showing all locks held in the system:
[11001.992741] 1 lock held by kthreadd/2:
[11001.996931]  #0: ffffffff94ec3c00 (scx_fork_rwsem){++++}-{0:0}, at: sched_fork+0x59/0x1c0
[11002.006155] 2 locks held by rcu_tasks_trace/15:
[11002.011223]  #0: ffffffff94fb1780 (rcu_tasks_trace.tasks_gp_mutex){+.+.}-{4:4}, at: rcu_tasks_one_gp+0x341/0x4e0
[11002.022633]  #1: ffffffff94ea3d20 (cpu_hotplug_lock){++++}-{0:0}, at: rcu_tasks_trace_pregp_step+0xa2/0x700
[11002.033587] 2 locks held by cpuhp/13/95:
[11002.037976]  #0: ffffffff94ea3d20 (cpu_hotplug_lock){++++}-{0:0}, at: cpuhp_thread_fun+0x87/0x280
[11002.047933]  #1: ffffffff94ea6400 (cpuhp_state-down){+.+.}-{0:0}, at: cpuhp_thread_fun+0x87/0x280
[11002.057916] 1 lock held by khungtaskd/247:
[11002.062498]  #0: ffffffff94fb24e0 (rcu_read_lock){....}-{1:3}, at: debug_show_all_locks+0x3b/0x200
[11002.072578] 1 lock held by systemd-journal/790:
[11002.077646]  #0: ffffffff94ec3c00 (scx_fork_rwsem){++++}-{0:0}, at: sched_fork+0x59/0x1c0
[11002.086846] 1 lock held by in:imklog/1591:
[11002.091427]  #0: ffff9aa01164eed8 (&f->f_pos_lock){+.+.}-{4:4}, at: __fdget_pos+0x7c/0xc0
[11002.100635] 1 lock held by udisksd/1584:
[11002.105024]  #0: ffffffff94ec3c00 (scx_fork_rwsem){++++}-{0:0}, at: sched_fork+0x59/0x1c0
[11002.114238] 1 lock held by cron/1820:
[11002.118344]  #0: ffffffff94ec3c00 (scx_fork_rwsem){++++}-{0:0}, at: sched_fork+0x59/0x1c0
[11002.127562] 2 locks held by sh/1940:
[11002.131580]  #0: ffff9aab88e248a8 (&tty->ldisc_sem){++++}-{0:0}, at: ldsem_down_read+0x29/0x40
[11002.141249]  #1: ffffb4458dd7b2f8 (&ldata->atomic_read_lock){+.+.}-{4:4}, at: n_tty_read+0x563/0x690
[11002.151519] 4 locks held by scx_simple/3632:
[11002.156310]  #0: ffffffff950ba1d0 (update_mutex){+.+.}-{4:4}, at: bpf_struct_ops_link_create+0x105/0x180
[11002.166926]  #1: ffffffff94ec3cb0 (scx_ops_enable_mutex){+.+.}-{4:4}, at: scx_ops_enable.constprop.0+0x60/0xae0
[11002.178233]  #2: ffffffff94ec3c00 (scx_fork_rwsem){++++}-{0:0}, at: scx_ops_enable.constprop.0+0x33c/0xae0
[11002.189062]  #3: ffffffff94ea3d20 (cpu_hotplug_lock){++++}-{0:0}, at: scx_ops_enable.constprop.0+0x341/0xae0
[11002.200077] 1 lock held by sched_ext_ops_h/3633:
[11002.205267]  #0: ffffffff94ec3cb0 (scx_ops_enable_mutex){+.+.}-{4:4}, at: scx_ops_disable_workfn+0xe0/0x580
[11002.216182] 6 locks held by bash/3660:
[11002.220387]  #0: ffff9aab83cfd450 (sb_writers#6){.+.+}-{0:0}, at: ksys_write+0x71/0xf0
[11002.229260]  #1: ffff9aa011637090 (&of->mutex){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x122/0x230
[11002.239216]  #2: ffff9aa0051ef4e0 (kn->active#150){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x12a/0x230
[11002.249553]  #3: ffffffff951e9070 (device_hotplug_lock){+.+.}-{4:4}, at: online_store+0x47/0xe0
[11002.259315]  #4: ffff9ab74d404d98 (&dev->mutex){....}-{4:4}, at: device_offline+0x97/0x120
[11002.268593]  #5: ffffffff94ea3dd0 (cpu_add_remove_lock){+.+.}-{4:4}, at: cpu_device_down+0x24/0x50
[11002.278664] 3 locks held by kworker/0:1/8931:
[11002.283545]  #0: ffff9aa000051358 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x3f5/0x680
[11002.294083]  #1: ffffb4458b617e40 ((shepherd).work){+.+.}-{0:0}, at: process_one_work+0x1d6/0x680
[11002.304037]  #2: ffffffff94ea3d20 (cpu_hotplug_lock){++++}-{0:0}, at: vmstat_shepherd+0x20/0x120
[11002.313888] 3 locks held by kworker/0:11/8939:
[11002.318874]  #0: ffff9aa000051358 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x3f5/0x680
[11002.329403]  #1: ffffb4458b847e40 ((work_completion)(&wfc.work)#3){+.+.}-{0:0}, at: process_one_work+0x1d6/0x680
[11002.340812]  #2: ffffffff94ea3d20 (cpu_hotplug_lock){++++}-{0:0}, at: _cpu_down+0x7d/0x460
[11002.350102] 4 locks held by kworker/13:2/9104:
[11002.355081]  #0: ffff9aa000051358 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x3f5/0x680
[11002.365619]  #1: ffffb4458923fe40 (rebuild_sd_work){+.+.}-{0:0}, at: process_one_work+0x1d6/0x680
[11002.375578]  #2: ffffffff94ec7130 (sched_energy_mutex){+.+.}-{4:4}, at: rebuild_sd_workfn+0x1b/0x40
[11002.385732]  #3: ffffffff94ea3d20 (cpu_hotplug_lock){++++}-{0:0}, at: rebuild_sched_domains+0x12/0x40
[11002.396090] 
[11002.397783] =============================================


--------------------------------------------------------------------------------------------------------------------------------

Meanwhile, I spent some time investigating the circular dependency issue on
PowerPC.

Here are some details:

During continuous CPU online and offline operations, the scx_simple scheduler
unregisters and registers itself. This process can lead to a potential deadlock.

Below is an example of a common circular dependency that is frequently observed
on PowerPC, though other scenarios may also exist:

Note: The entire stack trace is not included to keep the flow simple.
------------------------------------------------------------------------------------------------------------------------
Time     | CPU 1 (scx_simple)                 | CPU 7 (hotplug.sh)                   | CPU 3 (kthreadd)               |
------------------------------------------------------------------------------------------------------------------------
T1       | bpf_scx_reg                        | cpu_device_up                        | -                              |
------------------------------------------------------------------------------------------------------------------------
T2       | scx_ops_enable                     | _cpu_up                              | -                              |
------------------------------------------------------------------------------------------------------------------------
T3       | scx_fork_rwsem taken (#1)          | cpus_write_lock taken (#2)           | -                              |
         | (Holds scx_fork_rwsem)             | (Holds cpus_write_lock)              |                                |
------------------------------------------------------------------------------------------------------------------------
T4       | Waiting on CPU 7                   | cpuhp_invoke_callback                | -                              |
         | (To take cpus_read_lock)           |                                      |                                |
------------------------------------------------------------------------------------------------------------------------
T5       | Waiting on CPU 7                   | smpboot_create_threads               | -                              |
         | (To take cpus_read_lock)           |                                      |                                |
------------------------------------------------------------------------------------------------------------------------
T6       | Waiting on CPU 7                   | kthread_create_on_cpu                | -                              |
         | (To take cpus_read_lock)           |                                      |                                |
------------------------------------------------------------------------------------------------------------------------
T7       | Waiting on CPU 7                   | __kthread_create_on_node             | -                              |
         | (To take cpus_read_lock)           |                                      |                                |
------------------------------------------------------------------------------------------------------------------------
T8       | Waiting on CPU 7                   | spawn kthread                        | -                              |
         | (To take cpus_read_lock)           |                                      |                                |
------------------------------------------------------------------------------------------------------------------------
T9       | Waiting on CPU 7                   | Waits for kthread completion ------> | kthreadd (#3)                  |
         | (To take cpus_read_lock)           |                                      |                                |
------------------------------------------------------------------------------------------------------------------------
T10      | Waiting on CPU 7                   | Waits on CPU 3                       | create_kthread                 |
         | (To take cpus_read_lock)           | (For kthread completion)             |                                |
------------------------------------------------------------------------------------------------------------------------
T11      | Waiting on CPU 7                   | Waits on CPU 3                       | sched_fork                     |
         | (To take cpus_read_lock)           | (For kthread completion)             |                                |
------------------------------------------------------------------------------------------------------------------------
T12      | Waiting on CPU 7                   | Waits on CPU 3                       | scx_pre_fork                   |
         | (To take cpus_read_lock)           | (For kthread completion)             |                                |
------------------------------------------------------------------------------------------------------------------------
T13      | Waiting on CPU 7                   | Waits on CPU 3                       | Trying to hold scx_fork_rwsem  |
         | (To take cpus_read_lock)           | (For kthread completion)             | but it is held by #1           |
------------------------------------------------------------------------------------------------------------------------
T14      | Waiting on CPU 7                   | Waits on CPU 3                       | Waits on CPU 1                 |
         | (To take cpus_read_lock)           | (For kthread completion)             | (To release scx_fork_rwsem)    |
         | (Dependent on #2)                  | (Dependent on #3)                    | (Dependent on #1)              |
------------------------------------------------------------------------------------------------------------------------

Summary of events:

T1-T3: CPU 1 takes scx_fork_rwsem (#1), and CPU 7 takes cpus_write_lock (#2).

T4-T13: CPU 1 is waiting for CPU 7 to take cpus_read_lock, while CPU 7 is
performing a sequence of hotplug callbacks involving kthread creation and
waiting for the kthread completion on CPU 3.

T14: Circular dependency forms:
CPU 1 is waiting on CPU 7 (dependent on lock #2 to be released).
CPU 7 is waiting on CPU 3 (dependent on completion of #3).
CPU 3 is waiting on CPU 1 (dependent on lock #1 to be released).


This results in a deadlock where each CPU is waiting on another to release
a lock or complete an operation, creating a clear circular dependency
where none of the CPUs can progress.


Could I be missing something here, please let me know your thoughts?

Thanks,
Aboorva


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ