[<prev] [next>] [day] [month] [year] [list]
Message-ID: <Y2T72N6nE6UHDPx3@linutronix.de>
Date: Fri, 4 Nov 2022 12:47:36 +0100
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: linux-kernel@...r.kernel.org
Cc: Ben Segall <bsegall@...gle.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Steven Rostedt <rostedt@...dmis.org>,
Thomas Gleixner <tglx@...utronix.de>,
Vincent Guittot <vincent.guittot@...aro.org>
Subject: Latency spikes from within the scheduler.
Hi,
I've been playing with v6.1-rc3-rt2 on a 256 CPU machine. Some of my
findings:
- it can take some time to grab the RQ lock. That is to got on the CPU
_or_ to leave the CPU. For instance an 1ms example:
| cyclictest-4619 [011] d...2.. 1128.288674: idle_cpu <-find_busiest_group
| cyclictest-4619 [011] d...2.. 1128.288674: idle_cpu <-find_busiest_group
| cyclictest-4619 [011] d...2.. 1128.288675: raw_spin_rq_lock_nested <-load_balance
| cyclictest-4619 [011] d...3.. 1128.288675: _raw_spin_lock <-raw_spin_rq_lock_nested
| cyclictest-4619 [011] d...4.. 1128.288676: queued_spin_lock_slowpath <-raw_spin_rq_lock_nested
| cyclictest-4619 [011] d...3.. 1128.289688: update_rq_clock <-load_balance
| cyclictest-4619 [011] d...3.. 1128.289688: can_migrate_task <-load_balance
| cyclictest-4619 [011] d...3.. 1128.289688: kthread_is_per_cpu <-can_migrate_task
https://breakpoint.cc/traces-zen3/trace-load_balance_rq_lock-11.txt.zst [77KiB]
https://breakpoint.cc/traces-zen3/trace-load_balance_rq_lock.txt.zst [34MiB]
- A CPU can be attacked by rto_push_irq_work_func(). A few instances are
probably okay but doing it 86 times can sum up to ~400us if called from
flush_smp_call_function_queue() (after waiting for the RQ lock):
| <idle>-0 [032] d...2.. 4069.605763: irq_enter_rcu <-sysvec_apic_timer_interrupt
| <idle>-0 [032] dn.h6.. 4069.605929: queued_spin_lock_slowpath <-enqueue_task_rt
…
| <idle>-0 [032] d..h6.. 4069.605766: queued_spin_lock_slowpath <-try_to_wake_up
| <idle>-0 [032] dn.h5.. 4069.605923: update_rq_clock <-try_to_wake_up
…
| <idle>-0 [032] dn.h6.. 4069.605929: queued_spin_lock_slowpath <-enqueue_task_rt
| <idle>-0 [032] dn.h5.. 4069.606167: enqueue_top_rt_rq <-enqueue_task_rt
| <idle>-0 [032] dn.h4.. 4069.606169: sched_wakeup: comm=cyclictest pid=8045 prio=9 target_cpu=032
…
| <idle>-0 [032] dn.h2.. 4069.606171: irq_exit_rcu <-sysvec_apic_timer_interrupt
| <idle>-0 [032] .n..2.. 4069.606171: arch_cpu_idle_exit <-do_idle
| <idle>-0 [032] .n..2.. 4069.606171: flush_smp_call_function_queue <-do_idle
| <idle>-0 [032] dn..2.. 4069.606172: __flush_smp_call_function_queue <-flush_smp_call_function_queue
| <idle>-0 [032] dn..2.. 4069.606173: rto_push_irq_work_func <-irq_work_single
…
| <idle>-0 [032] dn.h2.. 4069.606578: irq_exit_rcu <-sysvec_irq_work
| <idle>-0 [032] .n..2.. 4069.606578: schedule_idle <-do_idle
| <idle>-0 [032] dn..2.. 4069.606579: rcu_note_context_switch <-__schedule
https://breakpoint.cc/traces-zen3/trace-rto_push_irq_work_func-32.txt.zst [102KiB]
https://breakpoint.cc/traces-zen3/trace-rto_push_irq_work_func.txt.zst [49MiB]
- I also managed to run into sched_rt_period_timer():
| <idle>-0 [027] dn.h1.. 630.600139: hrtimer_expire_entry: hrtimer=00000000602b5d49 function=sched_rt_period_timer now=630600137263
| <idle>-0 [027] dn.h2.. 630.600139: sched_rt_period_timer <-__hrtimer_run_queues
| <idle>-0 [027] dn.h2.. 630.600139: _raw_spin_lock <-sched_rt_period_timer
| <idle>-0 [027] dn.h3.. 630.600140: queued_spin_lock_slowpath <-sched_rt_period_timer
| <idle>-0 [027] dn.h3.. 630.600326: ktime_get <-sched_rt_period_timer
…
| <idle>-0 [027] dn.h3.. 630.606825: hrtimer_forward <-sched_rt_period_timer
| <idle>-0 [027] dn.h2.. 630.606826: _raw_spin_lock_irq <-__hrtimer_run_queues
| <idle>-0 [027] dn.h3.. 630.606826: enqueue_hrtimer <-__hrtimer_run_queues
| <idle>-0 [027] dn.h2.. 630.606826: hrtimer_start: hrtimer=00000000602b5d49 function=sched_rt_period_timer expires=631600000000 softexpires=631600000000 mode=ABS
That is almost 7ms of runtime just for that one hrtimer ensuring that
an RT task is not running longer than needing.
https://breakpoint.cc/traces-zen3/trace-sched_rt_period_timer-27.txt.zst [101K]
https://breakpoint.cc/traces-zen3/trace-sched_rt_period_timer.txt.zst [48MiB]
The -XX trace is just from that one CPU, the other the complete one. I
also have them as an all-in-one archive:
https://breakpoint.cc/traces-zen3/trace-all-in-one.tar.zst
And the dot-config:
https://breakpoint.cc/traces-zen3/dot-config.tar.zst
The first two were run with CONFIG-PERIODIC, the last one with CONFIG-NO_HZ.
Sebastian
Powered by blists - more mailing lists